Abstract

Resource-constrained environments, such as edge devices and low-tier servers, require database systems that efficiently support both vector similarity search over embeddings and complex graph traversals over interconnected data. Current vector or graph databases and their hybrid integrations are often memory-intensive, precluding their deployment under strict RAM limits, specifically those below one hundred megabytes. This paper presents the architectural design and theoretical framework for garibiDB, a proposed vector-graph database explicitly optimized for operation within such tight memory constraints while maintaining transactional consistency.

The garibiDB design integrates concepts from several domains. It leverages advanced vector quantization techniques (e.g., Product Quantization, Residual Quantization) to minimize embedding storage cost. It incorporates memory-efficient Approximate Nearest Neighbor (ANN) graph structures (like HNSW-PQ) and disk-backed indexing strategies, guided by analysis of memory-accuracy trade-offs from related work, to enable low-RAM indexing. A core innovation is a unified data model that treats embeddings as first-class graph attributes, supporting a declarative query language for seamless vector-graph operations. The architecture includes layered storage, lazy materialization, and MVCC-based consistency mechanisms tailored for limited memory, alongside memory-aware query execution techniques like sharding and bounded buffering.

Drawing upon established theoretical foundations and empirical findings on component-level performance from existing research, garibiDB is engineered to achieve complex hybrid query processing with memory footprints that grow sublinearly with dataset size. The framework’s principles are illustrated through its application to a dialogue memory component using compressed query-answer embeddings. This work provides a detailed blueprint for developing integrated vector-graph capabilities deployable on severely resource-limited devices.

Keywords: Vector-graph database, Resource-constrained environments, Memory efficiency, Vector quantization, Product Quantization, Residual Quantization, Approximate Nearest Neighbor (ANN), HNSW-PQ, Disk-backed indexing

Introduction

A. Background and Motivation

Modern data-driven applications—from on-device AI assistants to autonomous IoT systems—require both high-quality semantic retrieval over embeddings and rich navigational queries over entity relationships. Vector databases excel at nearest-neighbor search for high-dimensional representations, underpinning retrieval-augmented generation (RAG) for language models [Pound et al., 2025]. Graph databases, in contrast, offer efficient traversals and pattern matching over interconnected data [Pan et al., 2024]. Hybrid architectures (e.g., GraphRAG) have combined separate vector and graph engines but incur significant overhead for data movement and cross-system coordination [Wang et al., 2025; Memgraph, 2025]. An integrated system that natively supports both workloads without excessive memory use remains an open challenge.

B. Problem Statement: Designing a Vector + Graph Database under Tight Memory Constraints

Resource-constrained environments—such as edge devices or low-tier servers—often provide limited RAM yet must handle complex, interleaved semantic and relational queries. Existing hybrids like TigerVector embed vectors as graph attributes and extend GSQL with VectorSearch(), but assume multi-gigabyte memory and cluster deployment [Wang et al., 2025]. Standalone vector engines (DiskANN, IVF-PQ) achieve memory savings via quantization or disk-backing but lack native graph support [Guo et al., 2024]. Likewise, graph-ANN frameworks optimize for either in-RAM proximity graphs (HNSW, NSG, EFANNA) or disk-resident indexes but do not integrate vector storage [Ganbarov et al., 2024]. No prior system simultaneously delivers unified vector+graph query processing, transactional consistency, and sub-hundred-megabyte memory footprints.

C. Objectives and Theoretical Scope of garibiDB

garibiDB targets this gap with four core innovations:

  1. Advanced Quantization Variants

    • Product Quantization (PQ): Compresses vectors into sub-vector codebooks, reducing storage by 4–8× with minimal recall loss [2504.05478v1, §4].

    • Residual and Neural Quantization: Applies multi-codebook schemes and learned quantizers to further shrink embeddings while preserving accuracy [2503.19314v1, §5].

  2. Memory-Efficient Proximity Graphs

    • Graph-ANN Trade-offs: Evaluates HNSW, NSG, and EFANNA under tight memory budgets, revealing that quantized HNSW-PQ offers the best latency-accuracy balance within approx 50 MB RAM [2411.14006v1, §4].

    • Disk-Backed Segmented Indexes: Streams large payloads from flash while maintaining lightweight in-RAM navigation structures, following DiskANN and IVF-PQ paradigms [2408.04948v1, §5].

  3. Unified Data Model & Query Language

    • Embedding as Native Attribute: Defines a first-class EMBEDDING type with built-in metadata and MVCC deltas for ACID updates [2408.04948v1, §7].

    • Vector+Graph Operators: Integrates VectorSearch() and graph traversals into a single declarative language, enabling two-phase execution (vector-first or graph-first) based on selectivity estimates [2411.14006v1, §5].

  4. Persistence & Deployment Strategies

    • Chunked Loading & Lazy Deserialization: Stores graph and vector payloads in segmented files, loading only required chunks on-demand to minimize peak RAM [2309.11322, §6; 2502.12908v1, §3].

    • Optimized C++ Kernels & Parallelism: Leverages multicore execution and batched I/O in optimized C++ routines, achieving sustained throughput on devices such as Raspberry Pi 4 and Jetson Nano with < 256 MB RAM usage [2411.14006v1, §6; 2309.11322, §8].

II. Theoretical Foundations

A. Vector Space Modeling and Compression Theory

  1. Product Quantization (PQ) and Codebook Memory Cost

Product Quantization (PQ) partitions a $D$-dimensional vector into $m$ sub-vectors, each quantized to one of $k$ centroids learned via k-means. A commercial setting often uses $m=16, k=256$, requiring $16$ bytes per vector (one byte per sub-vector index) plus storage for $16$ codebooks of size $256 \times (D/16)$. For $D=128$, each codebook occupies $\approx 32$ KB, so total codebook RAM is $\approx 512$ KB, and the in-RAM index (one million vectors) is $\approx 16$ MB plus codebooks [2504.05478v1]. Designers must budget both per-vector bytes and codebook storage when targeting strict memory envelopes.

  1. Residual and Neural Quantization Frameworks

Residual Quantization (RQ) refines PQ by iteratively encoding quantization residuals, reducing reconstruction error at the cost of extra codebooks and index bytes per level [2504.05478v1]. Neural Quantization replaces static k-means with small, end-to-end trainable modules (e.g., vector Autoencoders), improving rate–distortion performance but introducing model parameters (often 2–5 MB) and runtime decoding overhead [2503.19314v1]. For on-device use, the combined cost of residual codebooks and neural-network weights must fit alongside in-RAM graph structures.

B. Proximity Graph Theory for ANN

  1. Small-World Graphs: HNSW, NSG, EFANNA
  • HNSW: Builds a hierarchical set of proximity graphs with long-range “shortcut” edges on upper layers and dense local links below; greedy search yields logarithmic hop counts [2504.10499v1].

  • NSG: Constructs a Monotonic Search Network guaranteeing that any greedy neighbor step reduces distance to the query, slightly reducing edges vs. HNSW for similar recall [2411.14006v1].

  • EFANNA: Combines KD-trees for coarse pruning with a proximity graph overlay, reducing memory by ~20% compared to HNSW at equivalent recall [2411.14006v1].

  1. Connectivity vs. Memory Trade-offs and Graph Pruning

Empirical studies show that setting edge degree $M=16$ in HNSW-PQ yields $\ge 0.9$ recall with $\sim 50$ MB index overhead, whereas $M=8$ cuts index size to $\sim 30$ MB at a $5-10%$ recall drop [2411.14006v1]. Graph pruning techniques—edge removal based on redundant paths or neighbor similarity—can reduce pointer counts by $30-50%$ with $< 2%$ recall loss [2411.14006v1]. Selecting $M$ and pruning thresholds enables precise control over the RAM–accuracy frontier.

C.Hybrid Memory–Disk Indexing Models

  1. Abstract Off-Heap Navigation Kernel

Hybrid indexes load only a compact navigation graph $G(V,E)$ into RAM—comprising quantized centroids and pointers—while storing full vectors on flash. Each vertex $v$ carries a disk offset pointer; search interleaves in-RAM neighbor expansions with asynchronous reads of candidate payloads [2408.04948v1].

  1. Refined Asymptotic RAM and I/O Bounds

Let $n$ be dataset size, $M$ average degree, $h$ hop count, $k$ final candidate set, payload block size $s$, and I/O block size $B$. RAM overhead is $O(nM)$ pointers plus codebooks. Amortized I/O cost per query is

$$O\left(h + \tfrac{k,s}{B} + r\right),$$ where $r(m)$ accounts for random‐access penalties (cache misses, SSD vs. eMMC latency) and can dominate under poor prefetching [2309.11322]; [2502.12908v1]. Effective designs use prefetch buffers, sequential block layouts, and larger $B$ to mitigate $r$—trading off potential over-fetch.

  1. Dynamic Updates and Consistency Mechanisms

On-line insertions/deletions demand mutable graph and vector indexes. MVCC-style delta logs buffer updates in small in-RAM pages; periodic compaction merges deltas into base structures, bounding peak RAM at the sum of live index, delta logs ($\sim 5-10$ MB), and compaction workspace [2504.10499v1]. This strategy supports ACID semantics with predictable overhead.


III. Conceptual Integration of Vector and Graph Models

A. Unified Data Model Specification

  1. Formalization of EMBEDDING as a First-Class Graph Attribute

To unify semantic and relational querying within a single model, we extend the standard property graph abstraction by introducing EMBEDDING as a native attribute type. Formally, a graph $G=(V,E,P)$ associates each vertex $v \in V$ with a property set $P_v$, augmented to include:

$$\texttt{EMBEDDING}_v = (\mathbf{e}_v, \mu_v),$$

where $\mathbf{e}_v \in \mathbb{R}^D$ is the vector representation (compressed via PQ, RQ, or neural codecs), and $\mu_v$ is a metadata tuple containing:

  • Dimension $D$

  • Codec specification (e.g., "PQ-m16-k256")

  • Distance metric ("L2" or "cosine")

  • MVCC version identifier

  • Timestamp for temporal consistency

The typical metadata footprint per embedding is 64–128 bytes, a non-negligible factor in memory-constrained environments where per-node overhead must be tightly controlled [Wang et al., 2025].

  1. Update Semantics and Compact Consistency Mechanisms Embedding updates follow multiversion concurrency control (MVCC). New versions are appended to a delta log rather than overwritten in place. Delta pages are compacted into base storage when thresholds (e.g., $\ge 10%$ of live data or $\ge 5$ MB) are exceeded. During compaction, memory usage may temporarily double due to workspace allocation, necessitating coordination with query schedulers to avoid concurrent memory contention [Pound et al., 2025]. This design supports low-overhead transactional updates while preserving a compact memory profile.

B. Query Algebra for Vector–Graph Operations

  1. Extended Operators for Semantic–Relational Composition

We augment traditional graph query algebra with operators supporting joint vector and graph reasoning. The core primitives are:

  • VECTOR_NEAREST(k, q, pred, codec, buffer): Retrieves the top-$k$ closest vertices (by specified metric and codec) to query vector $q$, filtered by optional predicate $pred$, using the specified memory buffer.

  • TRAVERSE(path, depth, filter, prefetch): Executes pattern-constrained traversal over the graph with prefetch directives to control I/O. Example:

LET candidates = VECTOR_NEAREST(50, q, v.label = 'Paper', codec='IVF-PQ', buffer=32MB);

RETURN TRAVERSE(v)-[:cites]->(w) WHERE w IN candidates;

These operators are executable under tight memory constraints through explicit control over codec, candidate limits, and memory buffers [Ganbarov et al., 2024].

  1. Algebraic Rules with Edge and Operator Optimization To support cost-based optimization, we define transformation rules:
  • Filter Pushdown:

$$VECTOR_NEAREST(k, q, pred) \equiv \text{FILTER}(pred, \text{VECTOR_NEAREST}(k, q))$$ improves vector-query efficiency when $pred$ is highly selective.

  • Graph Pruning Transformation:

TRAVERSE_M(v) -> TRAVERSE_prune(M, theta)(v)

reduces memory use by eliminating redundant edges without significant recall degradation. Empirical studies show pruning can reduce edge sets by $30-50%$ [2411.14006v1].

These rules allow query optimizers to reshape execution plans in response to selectivity, codec efficiency, and available memory.


IV. Memory-Aware Query Processing Framework

A. Sharding and Parallel Execution Models

  1. Partitioning Strategies under a Fixed RAM Budget

To operate efficiently under tight memory constraints, garibiDB partitions both vector and graph data into fixed-size shards. Each shard includes a subset of vertices, their compressed embeddings, and local graph neighborhoods. Partitioning is guided by estimated memory cost per shard:

$$\text{RAM}_{\text{shard}} \approx |V_i| \cdot (\text{embedding size} + \text{pointer size}) + \text{index overhead}$$

Shards are sized to fit within a bounded memory budget (e.g., $\le 128$ MB). High-degree or high-activity nodes are detected during preprocessing and may be split or replicated to avoid hotspot memory spikes.

  1. Concurrency Semantics and Load Balancing

Each shard runs independently with no shared state, enabling thread-safe execution and avoiding lock contention. Vector and graph queries are scheduled in parallel across shards. Load balancing is performed by monitoring query frequency per shard and redistributing tasks if imbalance exceeds a configurable threshold. To limit cross-shard traversal cost, edge-cut minimization heuristics are applied during partitioning, reducing remote access and I/O [2411.14006v1].


B. Learned Cost Estimation in Low-Resource Contexts

  1. Compact Graph-Aware Cost Models

garibiDB uses a lightweight graph neural network (GNN) to estimate query cost, using simple plan graphs as input. Operator nodes carry features such as selectivity, degree estimates, and embedding size. The model is small ($\sim 1$ MB), trained offline using representative workloads. At runtime, it predicts which execution plan—vector-first or graph-first—will consume less memory and I/O.

  1. Query History and Adaptive Plan Selection

Query patterns are hashed and stored in a compact trace cache. If a new query matches a previous fingerprint, its plan and resource profile are reused. Plans are invalidated if data changes significantly (e.g., large updates, major graph rewrites), using simple counters tied to index versions. This keeps planning overhead low without risking staleness [2504.10499v1].

C. Buffering, Batching, and Peak-Memory Minimization Techniques

  1. Working Set Bounds and Query Admission

Each query plan is statically analyzed to estimate peak working set: $$\text{Memory}{\text{query}} \leq k \cdot s{\text{vec}} + n \cdot s_{\text{graph}} + b \cdot B + \text{codec buffers}$$ where $k$ is top-$k$ candidates, $n$ is traversal width, $b$ is number of prefetch blocks, and $B$ is block size. If the estimate exceeds the session memory cap, the query is queued or rejected. This prevents out-of-memory crashes and maintains predictable performance on small devices [2408.04948v1].

  1. Operator Pipelining and I/O Scheduling

Operators run in a memory-pipelined fashion. Vector search fetches blocks in batches, overlapping disk I/O and decoding. Traversal stages hold limited node sets in memory and stream neighbors as needed. All buffers have static caps, and backpressure is applied when downstream operators lag. This allows garibiDB to maintain throughput without exceeding physical memory limits, even under concurrent load [2309.11322].

V. Persistence and Storage Architecture


A. Layered Storage Abstractions

  1. In-Memory Index Layer vs. On-Disk Payload Layer

garibiDB separates index structures from raw data to reduce memory usage. The in-memory layer holds lightweight navigation indexes—such as quantized ANN graphs (e.g., HNSW-PQ, IVF-PQ)—and routing pointers. These structures are optimized to remain compact (typically $<100$ MB for $1$M vectors with $M=16$ edges), allowing queries to run without loading full data into RAM [2411.14006v1].

Vector payloads, graph edge lists, and metadata reside on disk. Each item is referenced by offset and fetched only when required. This structure minimizes peak memory usage and aligns with systems like DiskANN and Starling [2408.04948v1].

  1. Lazy Materialization and Chunked Deserialization

garibiDB defers loading data until needed and deserializes it in fixed-size blocks (e.g., $4-16$ KB). Each block contains multiple objects and metadata for fast access. Queries initiate prefetching based on access likelihood (e.g., top-$k$ vector scores), and evicted blocks follow LRU or frequency-based policies [2309.11322].

Payloads are stored in compressed form (e.g., PQ), and decompression is performed in-place using bounded buffers. This avoids overloading memory while keeping decoding latency predictable [2504.05478v1].


B. Durability and Consistency Trade-offs

  1. Compact Transaction Model under RAM Limits

garibiDB uses a simplified MVCC design. Updates are stored as append-only delta pages—covering new embeddings, edge additions, and metadata changes. Readers access the latest visible version; writers do not block them.

To limit memory use, delta logs are capped and compacted in background threads. Old versions are aggressively pruned. This design favors snapshot isolation over full serializability, balancing consistency with system responsiveness [2504.10499v1].

  1. Epoch-Based and Log-Structured Persistence

Two write models are supported:

  • Epoch-based checkpointing: In-memory state is flushed periodically. A lightweight log tracks updates since the last checkpoint. On restart, the log is replayed. This model is simple and minimizes write overhead, suitable for read-heavy workloads.

  • Log-structured persistence: All updates are written to an append-only log and periodically merged. This offers better durability for write-intensive scenarios but increases memory pressure during compaction [2502.12908v1].

Both modes write data in chunk-aligned segments and rely on atomic renames or block-commit barriers for crash consistency. The system selects a mode based on workload type and available RAM.

garibiDB’s storage design minimizes memory use by separating small in-RAM indexes from large on-disk payloads. It loads data lazily in compressed blocks and maintains transactional consistency through delta logs and periodic compaction. Its dual write models—checkpointing and log-structured—support a range of workloads and devices, ensuring robust persistence without exceeding memory limits.

VI. Dialogue Memory Extension


A. Weighted Query–Answer Embedding Theory

  1. Simple and Efficient Vector Combination

garibiDB stores past query–answer (Q–A) pairs as composite memory embeddings to support retrieval in ongoing dialogues. Each memory vector is formed as a weighted combination:

$$\mathbf{m} = \alpha \cdot \mathbf{q} + (1 - \alpha) \cdot \mathbf{a}$$

where $\alpha$ is a task-specific constant chosen from predefined options (e.g., 0.5 for general QA, 0.7 for clarification queries). This design avoids runtime tuning overhead and supports direct use with quantized vectors by performing fusion prior to encoding [2504.05478v1].

To ensure embedding quality, combination is applied at vector encoding time, not at retrieval. Once fused, the memory vector is quantized (e.g., using PQ) and indexed alongside other embeddings, keeping inference costs low.

  1. Lightweight Importance Heuristics

garibiDB uses simple, interpretable heuristics to assign weights when generating memory entries:

  • Higher weight on query if it contains rare tokens or named entities

  • Higher weight on answer if it includes factual content or specific values These rules are fast, rule-based, and avoid dynamic model loading, which is ideal for constrained environments [2503.19314v1].


B. Memory Retrieval Formalism

  1. Efficient ANN Search and Filtering

Memory entries are stored in the same vector index as other embeddings and retrieved using the same ANN structures (e.g., HNSW-PQ). During lookup, new queries are compared to all memory embeddings using approximate distance metrics.

Results are filtered with a fixed similarity threshold $\tau$ applied after candidate retrieval. This avoids modifying the index traversal logic and keeps latency predictable [2411.14006v1]. The threshold is configurable but typically set between 0.7 and 0.9 for cosine similarity.

To keep memory fresh, garibiDB applies basic aging and LRU-based eviction. Memory vectors not accessed in recent queries are periodically removed to make room for new entries.

  1. Simple Template Enrichment

Returned memory entries are enriched by injecting them into templated structures (e.g., previous_answer: ..., follow_up_reference: ...). Templates are generated with minimal processing—only field substitution or basic token truncation.

Memory enrichment is optional and designed to support downstream use in multi-turn query resolution or context-aware RAG workflows. Memory entries are represented as lightweight key-value blocks attached to result sets, avoiding schema changes or added indexing complexity [2501.11216v3].

garibiDB supports dialogue memory by combining query–answer pairs into compressed memory embeddings and indexing them with standard ANN structures. It uses fixed weighting schemes and simple heuristics to minimize runtime cost, and applies post-retrieval filtering and basic templating for integration. This keeps the memory system efficient, extensible, and well-suited for low-memory environments.

Okay, here are the regenerated Sections VIII and IX based on the provided feedback.

VII. Conclusion

A. Summary of Contributions

This study introduced a unified theoretical framework for designing a vector–graph database tailored to memory-constrained environments. It integrated concepts from approximate nearest neighbor (ANN) indexing, graph traversal theory, and quantization-based vector compression to support efficient semantic and relational queries within limited RAM budgets. While existing systems like TigerVector [2501.11216] extend graph databases with vector capabilities and others like SymphonyQG [2504.05478] and MicroNN [2504.05573] address aspects of vector search or disk residency, garibiDB differentiates itself by focusing specifically on integrating these features under strict, heterogeneous memory constraints typical of embedded systems and by incorporating learned cost estimation in this context. Key contributions include:

  1. A hybrid data model treating embeddings as native graph attributes with support for vector-specific metadata and ACID-compliant updates under memory bounds.
  2. A query algebra supporting combined vector similarity and graph traversal with memory-aware execution planning.
  3. An ANN indexing layer designed for minimal memory footprint using techniques like HNSW-PQ and lazy payload retrieval. Specifically, theoretical analysis and preliminary results from related work [2411.14006v1] suggest that techniques like HNSW-PQ can enable storing indexes for millions of vectors within tens of megabytes of RAM (e.g., $\sim 50$ MB for $1$M vectors), supporting sublinear memory growth relative to dataset size. Furthermore, low-latency enrichment is achieved by retrieving full vector payloads and associated graph data only when strictly necessary during query execution, minimizing initial load times.
  4. A memory-bounded query execution engine featuring pipelined operators, shard-aware scheduling, and lightweight learned cost models, optimized for resource-constrained environments.
  5. A dialogue memory framework using compressed Q–A embedding fusion and ANN-based retrieval optimized for reuse and context integration, demonstrating a practical application of the core database principles.

Together, these components provide a coherent design space for deploying multi-modal databases on edge and embedded systems without sacrificing expressive power or retrieval quality, filling a gap not fully addressed by existing general-purpose or cloud-focused vector/graph databases.

B. Implications for Future Research in Resource-Constrained DBs

The findings establish a solid theoretical foundation for a new class of embedded database systems that bridge semantic search and structured querying. Practical implications include:

  1. The feasibility of executing complex vector–graph queries, such as 2-hop graph traversal filtering ANN-selected candidates, with memory growth that is sublinear to the total dataset size by effectively leveraging quantization and disk-backed payloads.
  2. The significant benefit of treating vector embeddings not as external objects but as first-class, queryable entities within the database schema, enabling richer hybrid query processing.
  3. The potential of simple, model-free planning heuristics combined with pre-trained, lightweight learned cost predictors in low-resource environments to achieve reasonable query performance without extensive runtime profiling.

This work also reveals that ANN retrieval, once thought too compute-heavy or memory-intensive for on-device use, is viable when combined with compact memory-resident graphs and selectively materialized payloads, provided careful memory management and I/O strategies are employed. However, deployment on highly resource-constrained platforms may face challenges related to firmware integration, efficient file system I/O batching on limited storage types (e.g., microSD), and minimizing operating system overhead on non-general-purpose RTOS platforms.

C. Open Questions and Next Steps

Several open questions emerge from this theoretical framework:

  1. How can real-time, ACID-compliant updates to both vector indexes and graph components be made efficient and crash-consistent without violating strict memory bounds?
  2. What is the optimal balance between vector quantization levels, graph connectivity (e.g., HNSW graph density), and lazy materialization strategies to maximize retrieval recall under varying latency constraints on edge hardware?
  3. Can unified cost models be extended to incorporate device-specific constraints such as power consumption, thermal limits for battery-powered devices, and the cost of offloading computation or data?
  4. What are the implications for security and privacy, particularly concerning update isolation (e.g., MVCC for multi-user or multi-application access) and implementing secure memory partitioning or per-session memory isolation under constrained conditions?

Future work will focus on implementing a minimal prototype of garibiDB to validate the theoretical concepts, conducting controlled benchmarks on real edge hardware (e.g., Raspberry Pi, Jetson Nano), and empirically evaluating the effectiveness of its query engine, persistence model, and update strategies. Further investigation into hybrid cloud–edge deployment models may also offer pathways for distributed memory management or leveraging external resources for specific query tasks.

IX. References

Ganbarov, A., Odintsov, I., Safin, I., Ustyuzhanin, V., & Kireev, I. (2024). Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices. arXiv preprint arXiv:2411.14006. Retrieved from https://arxiv.org/abs/2411.14006

Guo, Y., Luo, J., Liu, Y., Xu, X., Yang, J., Zhao, Y., Yang, K., Ma, J., Yu, P. S., & Wang, H. (2024). SymphonyQG: Symphonious integration of quantization and graphs for efficient and accurate vector search. arXiv preprint arXiv:2504.05478. Retrieved from https://arxiv.org/abs/2504.05478

Pan, J. J., Wang, J., & Li, G. (2024). Survey of vector database management systems. The VLDB Journal, 33(5), 1591–1615. https://doi.org/10.1007/s00778-024-00864-x

Pound, J., Zou, D., Crankshaw, D., Kraska, T., & Jeffrey, S. (2025). MicroNN: An on-device disk-resident updatable vector database. arXiv preprint arXiv:2504.05573. Retrieved from https://arxiv.org/abs/2504.05573

Qinco2. (2024). In Y. Sun, Y. Shao, Z. Qin, G. Li, L. Quan, & R. Ji, Qinco2: Residual quantization with implicit neural quantizer. arXiv preprint arXiv:2503.19314. Retrieved from https://arxiv.org/abs/2503.19314 (Original work published 2024)

TigerGraph. (2025). TigerVector: Supporting vector search in graph databases. arXiv preprint arXiv:2501.11216. Retrieved from https://arxiv.org/abs/2501.11216

Wang, J., Pan, J., Wu, Z., Yang, J., Li, G., Liu, Y., Zhang, Z., & Wang, H. (2025). VectorGraphRAG: Retrieval-augmented generation over graph databases with vector search. arXiv preprint arXiv:2408.04948. Retrieved from https://arxiv.org/abs/2408.04948

Zhang, Z., Wang, H., Hu, R., Wu, Z., Pan, J., Wang, J., Li, G., & Liu, Y. (2025). RAG on Graph: Leveraging vector and graph retrieval in language model reasoning. arXiv preprint arXiv:2502.12908. Retrieved from https://arxiv.org/abs/2502.12908

Zhao, B., Liu, Y., Li, Z., Yang, J., Zhang, Z., Wang, H., Pan, J., Wang, J., Su, Y., Zhang, M., Wang, Y., Liu, D., Wang, H., & Li, G. (2023). LlamaIndex: A framework for building retrieval-augmented applications with LLMs. arXiv preprint arXiv:2309.11322. Retrieved from https://arxiv.org/abs/2309.11322