Introducing FedNGDB

Paper Link: https://openreview.net/pdf?id=3K1LRetR6Y

I’m excited to share our recent work on Federated Neural Graph Databases (FedNGDBs), a novel framework that enables privacy-preserving reasoning over distributed knowledge graphs!

The Challenge

As large language models and AI systems increasingly rely on external knowledge sources, the ability to efficiently retrieve precise information becomes critical. Neural graph databases (NGDBs) have emerged as powerful tools for this purpose, but they face a significant limitation: they can only operate on single, centralized graphs.

In real-world scenarios, valuable knowledge is often distributed across multiple sources that cannot be directly shared due to privacy concerns, regulations like GDPR, or commercial interests. This creates a fundamental tension - how can we enable complex reasoning across distributed knowledge graphs without compromising data privacy?

Our Solution: FedNGDB

We’ve developed FedNGDB, a pioneering framework that enables complex query answering across distributed knowledge graphs while preserving privacy. FedNGDB leverages federated learning to collaboratively train models across multiple knowledge sources without sharing raw graph data.

Key innovations in our approach:

Secret Aggregation: We developed a novel technique to protect entity embeddings during the federated learning process, preventing even the central server from accessing sensitive information
Query Decomposition: For cross-graph queries, our system intelligently decomposes complex queries into sub-queries that can be processed by individual knowledge graphs
Distributed Retrieval: Answers are scored locally on each graph database and aggregated to provide comprehensive results

Results

We evaluated FedNGDB on three widely-used knowledge graph datasets (FB15k, FB15k-237, and NELL995) and found that:

FedNGDB significantly outperforms isolated local models for both in-graph and cross-graph queries
Our approach achieves comparable performance to centralized models that have access to all data
FedNGDB works effectively with various base query encoding methods (GQE, Q2P, Tree-LSTM)

Why This Matters

FedNGDB enables new collaborative scenarios where organizations can jointly answer complex queries across their knowledge graphs without compromising sensitive data. This has important applications in:

Healthcare (connecting patient data across institutions)
Finance (detecting fraud patterns across banks)
Research collaboration (connecting findings across organizations)
Enterprise knowledge management (reasoning across departmental knowledge silos)

Our work represents an important step toward privacy-preserving distributed reasoning systems that can unlock the collective knowledge of organizations while respecting data privacy boundaries.

The full paper is available at: https://openreview.net/pdf?id=3K1LRetR6Y