Large Language Models (LLMs) have taken the world by storm. They can write code, draft emails, and answer complex questions. But they have a fundamental limitation: their knowledge is frozen in time, and they can sometimes “hallucinate” or make up facts.
A popular solution to this is Retrieval-Augmented Generation (RAG). The idea is simple: when you ask the LLM a question, a “retriever” first searches an external database (like Wikipedia or a company’s internal documents) for relevant information. This information is then passed to the LLM as context to help it generate a factual, up-to-date answer.
But RAG has its own challenges. The retrieval step can be slow, and feeding large amounts of context into the LLM for every single query is computationally expensive and increases latency. This is especially true when dealing with massive, complex knowledge sources like Knowledge Graphs (KGs), which store facts as interconnected triples (e.g., (Paris, capital_of, France)).
What if we could bake this knowledge directly into the model’s “brain” efficiently, without constant retrieval or massive context windows?
In our recent work, we introduce AtlasKV, a new framework that does exactly that. AtlasKV augments LLMs with billion-scale knowledge graphs, making them more knowledgeable and factually grounded, all while using a surprisingly small amount of GPU memory (less than 20GB for a KG with one billion facts).
The Secret Sauce: How AtlasKV Works
To understand AtlasKV’s breakthroughs, we first need to look at the new paradigm it builds upon. A promising direction for knowledge integration, pioneered by methods like KBLaM, involves plugging external knowledge directly into the LLM’s attention layers.
The core idea is to use a lightweight “Knowledge Adapter” (also called a projection head). This is a small, trainable set of parameters that acts as a universal translator. It learns to take key-value facts from an external source and project them into the specific representation space that the LLM’s attention mechanism understands. This is done while keeping the main LLM frozen, making it a very efficient way to teach the model new facts.
However, this innovative approach faced two major hurdles that prevented it from working at a massive scale:
- The Data Quality Problem: The method requires high-quality Query-Key-Value (Q-K-V) training data. Previous work relied on synthesizing this data from plain text using rigid templates, which resulted in limited query diversity and poor generalization. The model learned to answer specific question formats but struggled with the variety of questions seen in the real world.
- The Scalability Wall: The computation and memory costs grew linearly with the size of the knowledge base. While more efficient than other methods, this linear scaling still makes it prohibitively expensive to work with knowledge graphs containing hundreds of millions or billions of facts.
Our work, AtlasKV, is designed to solve these exact challenges, making this powerful parametric approach truly scalable and effective. We achieve this with two complementary innovations.
1. KG2KV: Speaking the LLM’s Native Language
The first challenge is getting the knowledge into a format the LLM can easily understand. An LLM’s core component is the self-attention mechanism, which works with Query, Key, and Value (Q-K-V) vectors.
Our KG2KV pipeline cleverly transforms each fact in a knowledge graph into this native Q-K-V format. For a triple like (John founded, cause, StockLemon.com), we can rephrase it as:
- Query: “What is the cause of John founding StockLemon.com?”
- Key: “the cause of John founded StockLemon.com”
- Value: “StockLemon.com”
This approach is far more powerful than previous methods that used rigid templates to create training data. By leveraging the rich and diverse relationships in a knowledge graph, we create high-quality, varied training data, from the ATLAS-Family KG we constructed in our previous work AutoSchemaKG, that helps the model generalize to new, unseen questions and topics.
2. HiKVP: Finding a Needle in a Billion-Haystack Haystack
The second, and perhaps biggest, challenge is scalability. How do you give a model access to a billion facts without needing a supercomputer? Loading all one billion “keys” into GPU memory is impossible.
This is where our Hierarchical Key-Value Pruning (HiKVP) algorithm comes in. Instead of a flat, disorganized list of facts, HiKVP organizes the knowledge keys into a hierarchy, much like a library’s filing system.
When a query comes in, the model doesn’t search through every single fact. Instead, it performs a highly efficient, multi-step search:
- Root-Layer Search: It first looks at a small set of high-level “root” keys to find the general semantic area of the query.
- Inter-Layer Search: Based on the best matches from the root layer, it narrows the search down to a more specific set of “intermediate” keys.
- Leaf-Layer Search: Finally, it searches within this much smaller, highly relevant set of “leaf” keys to pinpoint the exact facts needed to answer the query.
This hierarchical pruning means that at any given moment, only a tiny fraction of the total knowledge base needs to be in the GPU’s active memory. This dramatically reduces both memory usage and computation time, changing the complexity from linear (O(M)) to sub-linear (O(∛M)), where M is the number of facts.
The Results: Unmatched Scalability and Accuracy
So, does it work? The results are striking.
Incredible Scalability: As shown in our experiments, while other methods see their memory usage explode as the knowledge graph grows, AtlasKV’s memory cost remains remarkably flat. We successfully augmented an LLM with a 1 billion triple KG using less than 20GB of VRAM—a task that is simply infeasible for other methods.

Superior Generalization and Accuracy: AtlasKV isn’t just more efficient; it’s smarter. Thanks to the high-quality training data from KG2KV, our model significantly outperforms baselines in both knowledge grounding accuracy and the relevance of its generated answers. It performs exceptionally well on out-of-distribution datasets—scenarios with complex questions and topics it has never seen during training.
Why This Matters
AtlasKV represents a major step forward in creating truly knowledgeable AI. By making it feasible to integrate massive, domain-specific knowledge graphs directly into LLMs on commodity hardware, we unlock a new class of applications:
- Enterprise AI: Imagine an assistant with perfect, instant recall of your company’s entire product catalog, support history, and internal processes.
- Scientific Research: A research assistant that has internalized all of PubMed or ArXiv, capable of answering complex questions and connecting disparate concepts.
- Fact-Grounded Chatbots: Customer service bots that provide accurate, reliable answers grounded in a comprehensive knowledge base, drastically reducing hallucinations.
We are excited about the future possibilities that AtlasKV enables and are committed to pushing the boundaries of what’s possible with knowledgeable, scalable, and efficient language models.
Interested in the technical details?
- Read the full paper on arXiv: https://www.arxiv.org/pdf/2510.17934
- Check out the source code: https://github.com/HKUST-KnowComp/AtlasKV;
- Check out the models: https://huggingface.co/collections/HaoyuHuang2/atlaskv