# FAQ Common questions and troubleshooting for GO3. ## Gene not found **Error**: `ValueError: gene "XYZ" not found` Gene names must exactly match the `db_object_symbol` column in your GAF file (case-sensitive). Common causes: - Using an alias or old symbol instead of the current one. - The gene has no GO annotations in your GAF file. - Typos or extra whitespace in the gene name. **Fix**: open your GAF file and search for the gene symbol. GAF files use tab-separated columns; `db_object_symbol` is column 3 (0-indexed: column 2). ## Similarity is 0.0 A result of 0.0 typically means: - The two terms are in **different namespaces** (e.g., one is BP and the other is MF). Cross-namespace terms have no common ancestor. - One or both GO IDs are **invalid or not in the loaded ontology**. - The terms have **no common ancestor** with IC > 0 (very rare within the same namespace). - For gene comparisons: one or both genes have **no annotations** in the selected namespace. **Fix**: verify that both terms/genes exist and belong to the same namespace. Use `get_term_by_id` to inspect individual terms. ## Which OBO file should I use? Use **`go-basic.obo`** (recommended). This is the standard filtered version that contains only `is_a` and `part_of` relationships within each namespace. The full `go.obo` includes cross-ontology links and additional relationship types that may produce unexpected similarity scores. Unless you specifically need those relationships, stick with `go-basic.obo`. If you call `go3.load_go_terms()` without a path, GO3 downloads `go-basic.obo` automatically. ## Can I use a custom ontology? Yes. GO3 accepts any OBO-format file. If you have a custom or domain-specific ontology in OBO format, pass its path to `load_go_terms`: ```python go3.load_go_terms("my_custom_ontology.obo") ``` The same parsing and caching logic applies. ## How do I filter by evidence code? GO3 automatically filters out `ND` (No biological Data) annotations and annotations with `NOT` qualifiers. For stricter filtering (e.g., excluding `IEA` to use only experimentally curated annotations), pre-filter your GAF file before passing it to `load_gaf`: ```python # Example: filter GAF to keep only experimental evidence codes experimental_codes = {"EXP", "IDA", "IPI", "IMP", "IGI", "IEP"} with open("goa_human.gaf") as f: with open("goa_human_experimental.gaf", "w") as out: for line in f: if line.startswith("!"): out.write(line) continue fields = line.split("\t") if len(fields) > 6 and fields[6] in experimental_codes: out.write(line) annots = go3.load_gaf("goa_human_experimental.gaf") ``` ## set_num_threads doesn't seem to help The Rayon thread pool is **initialized once**. If you call `set_num_threads` after the pool has already been created (e.g., after a batch similarity call), the new value is ignored. **Fix**: call `set_num_threads` at the very beginning of your script, before any batch operations: ```python import go3 go3.set_num_threads(8) # Must be before any batch calls go3.load_go_terms("go-basic.obo") annots = go3.load_gaf("goa_human.gaf") counter = go3.build_term_counter(annots) # Now batch calls will use 8 threads ``` ## Memory usage is high GO3 caches the ontology graph, gene-to-GO mapping, ancestor sets, DCA results, and IC values **globally in-process**. This is by design -- caching eliminates redundant computation and is the main reason GO3 is fast. Typical memory for a human analysis (`go-basic.obo` + `goa_human.gaf`): ~100--200 MB. If memory is a concern: - Avoid loading multiple large GAF files in the same process. - Reloading the ontology (`load_go_terms`) replaces the previous ontology cache. - For very large all-vs-all analyses, consider working in batches or subsets. ## How do I reproduce results across runs? For reproducible similarity scores: - Pin your ontology version (use a specific `go-basic.obo` file, not auto-download). - Pin your GAF version. - For embeddings, set `random_state` in `tsne_genes` / `umap_genes`. Similarity scores are deterministic given the same ontology and annotation files.