Utilities and Advanced APIs¶
Threading¶
set_num_threads(n_threads)
Controls the maximum number of internal threads used by GO3 batch operations.
import go3
go3.set_num_threads(8)
IC lookup¶
term_ic(go_id, counter) returns the Information Content for one term.
ic = go3.term_ic("GO:0006397", counter)
Gene distance matrices¶
gene_distance_matrix(genes=None, ontology="BP", similarity="lin", groupwise="bma", counter=..., distance_transform="auto")
Returns (gene_order, distance_matrix).
Distance transforms¶
The distance_transform parameter controls how similarity scores are converted to distances for clustering and embedding algorithms. Available options:
Transform |
Formula |
When to use |
|---|---|---|
|
depends on method |
Recommended default. Automatically selects the best transform: |
|
|
Use for normalized methods that produce values in [0, 1]. Self-distance is 0, maximum distance is 1. |
|
|
Use for unbounded methods (e.g., Resnik) where the range is not fixed. Produces distances relative to the highest observed similarity. |
|
|
Always valid regardless of method range, but produces a non-linear mapping. Useful as a fallback when other transforms are not appropriate. |
Embedding APIs¶
These helpers build embeddings from precomputed GO similarity-derived distances.
tsne_genes(genes, ontology, similarity, groupwise, counter, ...)¶
Computes a t-SNE embedding from gene-level semantic similarity distances.
Key parameters:
perplexity(float) – Controls the balance between local and global structure. Higher values consider more neighbors per point. Must be less than the number of genes. A starting point isperplexity ~ sqrt(n_genes).n_iter(int) – Number of optimization iterations. Default is typically 1000; 500 is often sufficient for exploration.n_components(int) – Dimensionality of the output embedding (usually 2).random_state(int) – Seed for reproducibility.distance_transform(str) – How to convert similarity to distance (see above).
umap_genes(genes, ontology, similarity, groupwise, counter, ...)¶
Computes a UMAP embedding from gene-level semantic similarity distances.
Key parameters:
n_neighbors(int) – Number of nearest neighbors to consider when constructing the graph. Smaller values emphasize local structure; larger values capture more global patterns. Must be less than the number of genes. A starting point isn_neighbors ~ 15for exploratory analysis.min_dist(float) – Minimum distance between embedded points. Smaller values produce tighter clusters; larger values spread points more evenly.random_state(int) – Seed for reproducibility.distance_transform(str) – How to convert similarity to distance (see above).
plot_embedding(embedding, ...)¶
Creates a scatter plot from a 2D embedding array.
genes(list[str]) – Gene labels for each point.annotate– Label display mode:"all"labels every point,"auto"labels only non-overlapping points,Nonedisables labels.title(str) – Plot title.categories(list[str], optional) – Categorical labels for coloring points by group.
Returns (fig, ax) matplotlib objects.
plot_tsne_genes(...) / plot_umap_genes(...)¶
Convenience wrappers that combine embedding computation and plotting in one call. They accept the same parameters as tsne_genes / umap_genes plus the plotting parameters from plot_embedding.
Return (gene_order, embedding, fig, ax).
Minimal embedding example¶
genes = ["BRCA1", "CASP8", "TP53", "EGFR", "AKT1"]
genes, emb = go3.tsne_genes(
genes,
ontology="BP",
similarity="lin",
groupwise="bma",
counter=counter,
perplexity=2.0,
random_state=42,
)
fig, ax = go3.plot_embedding(emb, genes=genes, annotate="auto", title="GO embedding")
API reference¶
- compare_gene_pairs_batch(pairs, ontology, similarity, groupwise, counter)
Compute semantic similarity between genes in batches.
- Parameters:
pairs (list of (str, str)) – List of pairs of genes to calculate the semantic similarity
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
List of similarity scores.
- Return type:
list of float
- Raises:
ValueError – If method or combine are unknown.
- gene_distance_matrix(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto')
Compute a gene-to-gene distance matrix using GO semantic similarity.
- Parameters:
genes (Optional[list[str]]) – List of genes to include. If None, uses all genes with annotations.
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
distance_transform (str) – How to convert similarity to distance. Options: “auto”, “one_minus”, “reciprocal”, “max_minus”.
- Returns:
Tuple with the gene order and a square distance matrix.
- Return type:
(list[str], list[list[float]])
- plot_embedding(embedding, genes=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)
Plot a 2D embedding with matplotlib.
- plot_tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)
Compute t-SNE embeddings and plot them with matplotlib.
- plot_umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None)
Compute UMAP embeddings and plot them with matplotlib.
- set_num_threads(n_threads)
Configure the maximum number of threads rayon will use.
- Parameters:
n_threads (int) – Number of threads to use. If 0, uses all available cores.
- term_ic(go_id, counter)
Compute the Information Content (IC) of a GO term.
- Parameters:
go_id (str) – GO term identifier.
counter (TermCounter) – Precomputed term counter with IC values.
- Returns:
The IC of the GO term.
- Return type:
float
- termset_similarity(terms1, terms2, term_similarity='lin', groupwise='bma', counter=None)
Compute semantic similarity between two sets of GO terms.
- Parameters:
terms1 (list of str) – First list of GO term IDs.
terms2 (list of str) – Second list of GO term IDs.
term_similarity (str) – Name of the pairwise similarity method.
groupwise (str) – Groupwise combination method. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
- Returns:
Similarity score.
- Return type:
float
- tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None)
Compute t-SNE embeddings from a gene list using a precomputed distance matrix.
- umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None)
Compute UMAP embeddings from a gene list using a precomputed distance matrix.