# Architecture This page describes GO3's internal design and explains why it achieves 8--25x speedups over pure-Python GO semantic similarity libraries. ## Overview GO3 is a **Rust core** compiled into a **CPython extension module** via [PyO3](https://pyo3.rs/) and [Maturin](https://www.maturin.rs/). All performance-critical code (parsing, graph traversal, IC computation, similarity calculations) runs in compiled Rust, while the Python API provides a familiar interface for scripting and integration. ## Data flow ``` OBO file ──parse──▶ GO graph cache (terms, edges, depths) │ ▼ GAF file ──parse──▶ gene-to-GO mapping ──propagate──▶ TermCounter (counts, IC) │ ▼ similarity / batch / matrix APIs ``` 1. **OBO parsing**: the OBO file is parsed into an in-memory directed acyclic graph. Each term stores its parents, children, depth, level, and metadata. Ancestor sets are precomputed and cached. 2. **GAF parsing**: the GAF file is parsed to build a gene-to-GO mapping. Obsolete terms are resolved via `replaced_by` / `consider`. ND and NOT annotations are filtered. 3. **Counter construction**: annotation counts are propagated up the DAG (each annotation increments the term and all its ancestors). IC is computed per term per namespace. 4. **Similarity computation**: pairwise or batch similarity is computed using cached graphs, ancestor sets, and IC values. ## Key design decisions ### Global caches GO3 maintains several global caches within the Python process: - **Ontology graph**: the full term DAG with parent/child edges. - **Ancestor sets**: precomputed `is_a` ancestor sets for every term, enabling O(1) lookups for common-ancestor queries. - **DCA cache**: deepest common ancestor results are cached to avoid redundant DAG traversals. - **Gene-to-GO mapping**: maps gene symbols to their annotated GO terms per namespace. - **IC values**: per-term Information Content, computed once by `build_term_counter`. Caching trades memory for speed. For a typical human analysis, total memory usage is ~100--200 MB. ### Rayon thread pool Batch operations (`batch_similarity`, `compare_gene_pairs_batch`, `gene_distance_matrix`) use [Rayon](https://docs.rs/rayon/) to parallelize pair evaluation across CPU cores. The thread pool is initialized once via `set_num_threads` and reused for all subsequent parallel work. ### FxHashMap GO3 uses [FxHashMap](https://docs.rs/rustc-hash/) (a fast, non-cryptographic hash map) for internal lookups. FxHashMap is significantly faster than the standard `HashMap` for string keys like GO IDs, reducing lookup overhead in hot loops. ## Why it's fast The combination of these design choices eliminates the main bottlenecks in pure-Python implementations: - **No interpreter overhead in hot loops**: similarity computations, ancestor lookups, and IC queries all run in compiled Rust. Python is only involved at the API boundary. - **Parallel pair evaluation**: Rayon distributes independent pair computations across threads. For N pairs on T threads, wall time scales roughly as N/T. - **O(1) cached lookups**: precomputed ancestor sets and cached DCA results avoid redundant graph traversals. In pure-Python tools, these traversals are often the dominant cost. - **Cache-friendly data structures**: Rust's contiguous memory layout and FxHashMap's low overhead reduce cache misses compared to Python's object-heavy, pointer-chasing data structures.