FAQ

Common questions and troubleshooting for GO3.

Gene not found

Error: ValueError: gene "XYZ" not found

Gene names must exactly match the db_object_symbol column in your GAF file (case-sensitive). Common causes:

  • Using an alias or old symbol instead of the current one.

  • The gene has no GO annotations in your GAF file.

  • Typos or extra whitespace in the gene name.

Fix: open your GAF file and search for the gene symbol. GAF files use tab-separated columns; db_object_symbol is column 3 (0-indexed: column 2).

Similarity is 0.0

A result of 0.0 typically means:

  • The two terms are in different namespaces (e.g., one is BP and the other is MF). Cross-namespace terms have no common ancestor.

  • One or both GO IDs are invalid or not in the loaded ontology.

  • The terms have no common ancestor with IC > 0 (very rare within the same namespace).

  • For gene comparisons: one or both genes have no annotations in the selected namespace.

Fix: verify that both terms/genes exist and belong to the same namespace. Use get_term_by_id to inspect individual terms.

Which OBO file should I use?

Use go-basic.obo (recommended). This is the standard filtered version that contains only is_a and part_of relationships within each namespace.

The full go.obo includes cross-ontology links and additional relationship types that may produce unexpected similarity scores. Unless you specifically need those relationships, stick with go-basic.obo.

If you call go3.load_go_terms() without a path, GO3 downloads go-basic.obo automatically.

Can I use a custom ontology?

Yes. GO3 accepts any OBO-format file. If you have a custom or domain-specific ontology in OBO format, pass its path to load_go_terms:

go3.load_go_terms("my_custom_ontology.obo")

The same parsing and caching logic applies.

How do I filter by evidence code?

GO3 automatically filters out ND (No biological Data) annotations and annotations with NOT qualifiers. For stricter filtering (e.g., excluding IEA to use only experimentally curated annotations), pre-filter your GAF file before passing it to load_gaf:

# Example: filter GAF to keep only experimental evidence codes
experimental_codes = {"EXP", "IDA", "IPI", "IMP", "IGI", "IEP"}

with open("goa_human.gaf") as f:
    with open("goa_human_experimental.gaf", "w") as out:
        for line in f:
            if line.startswith("!"):
                out.write(line)
                continue
            fields = line.split("\t")
            if len(fields) > 6 and fields[6] in experimental_codes:
                out.write(line)

annots = go3.load_gaf("goa_human_experimental.gaf")

set_num_threads doesn’t seem to help

The Rayon thread pool is initialized once. If you call set_num_threads after the pool has already been created (e.g., after a batch similarity call), the new value is ignored.

Fix: call set_num_threads at the very beginning of your script, before any batch operations:

import go3
go3.set_num_threads(8)  # Must be before any batch calls

go3.load_go_terms("go-basic.obo")
annots = go3.load_gaf("goa_human.gaf")
counter = go3.build_term_counter(annots)
# Now batch calls will use 8 threads

Memory usage is high

GO3 caches the ontology graph, gene-to-GO mapping, ancestor sets, DCA results, and IC values globally in-process. This is by design – caching eliminates redundant computation and is the main reason GO3 is fast.

Typical memory for a human analysis (go-basic.obo + goa_human.gaf): ~100–200 MB.

If memory is a concern:

  • Avoid loading multiple large GAF files in the same process.

  • Reloading the ontology (load_go_terms) replaces the previous ontology cache.

  • For very large all-vs-all analyses, consider working in batches or subsets.

How do I reproduce results across runs?

For reproducible similarity scores:

  • Pin your ontology version (use a specific go-basic.obo file, not auto-download).

  • Pin your GAF version.

  • For embeddings, set random_state in tsne_genes / umap_genes.

Similarity scores are deterministic given the same ontology and annotation files.