I am thrilled to announce that three of our recent papers have been accepted to the International Conference on Learning Representations (ICLR) 2026.
This year, our research focused heavily on bridging the gap between Large Language Models (LLMs) and structured knowledge, as well as pushing the boundaries of scientific discovery agents. From fitting billion-scale knowledge graphs into consumer hardware to rediscovering Newton’s laws, here is a breakdown of the work we will be presenting.
1. AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Authors: Haoyu Huang, Hong Ting Tsang, Jiaxin Bai, Xi Peng, Gong Zhang, Yangqiu Song
Retrieval-Augmented Generation (RAG) is powerful, but traditional methods struggle with massive scale and inference latency. In this paper, we propose AtlasKV, a parametric knowledge integration method.
- The Breakthrough: We introduce a way to augment LLMs with billion-scale Knowledge Graphs (e.g., 1B triples) using less than 20GB of VRAM.
- How it works: We utilize KG2KV to convert triples into Key-Value data and HiKVP (Hierarchical Key-Value Pruning) to reduce memory overhead while maintaining high knowledge grounding accuracy. This eliminates the need for external retrievers or massive context windows.
- https://arxiv.org/abs/2510.17934
2. NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Authors: Tianshi Zheng, Kelvin Kiu-Wai Tam, Newt Hue-Nam K. Nguyen, et al.
Can LLMs truly act as scientists? Existing benchmarks often treat scientific discovery as static function fitting. We introduce NewtonBench to rigorously test LLM agents in interactive environments.
- The Benchmark: We created 324 discovery tasks across 12 physics domains. Crucially, we use “counterfactual law shifts”—altering canonical physical laws (like gravity or Hooke’s law)—to ensure models aren’t just memorizing textbooks.
- Key Finding: While frontier models like GPT-5 show promise, they are fragile. Interestingly, we found a “paradox of tool assistance,” where giving capable models a code interpreter sometimes hurts performance by inducing premature exploitation over exploration.
- https://arxiv.org/abs/2510.07172
3. CtrlHGen: Controllable Logical Hypothesis Generation for Abductive Reasoning
Authors: Yisen Gao, Jiaxin Bai, Tianshi Zheng, Qingyun Sun, Ziwei Zhang, Jianxin Li, Yangqiu Song, Xingcheng Fu
Abductive reasoning (inferring the best explanation for an observation) is vital for AI, but generating relevant hypotheses on large Knowledge Graphs is difficult due to “hypothesis space collapse.”
- The Solution: We propose CtrlHGen, a framework that allows for controllable hypothesis generation. It allows users to control the semantic content and structural complexity of the reasoning process.
- The Method: We utilize a two-stage training paradigm (supervised + reinforcement learning) and a novel sub-logical decomposition strategy to help the model generate long, complex logical hypotheses without drifting from the user’s control constraints.
- https://arxiv.org/abs/2505.20948
Acknowledgements I want to extend a huge thank you to my co-authors and advisors at HKUST and Beihang University, as well as our collaborators from the industry. These projects were massive collaborative efforts. I am particularly grateful to the students who placed their trust in me and shared the vision to tackle the challenging intersection of structured knowledge reasoning, neural graph databases, and their integration with LLMs.