Utilities and Advanced APIs¶

Threading¶

set_num_threads(n_threads)

Controls the maximum number of internal threads used by GO3 batch operations.

import go3
go3.set_num_threads(8)

IC lookup¶

term_ic(go_id, counter) returns the Information Content for one term.

ic = go3.term_ic("GO:0006397", counter)

Gene distance matrices¶

gene_distance_matrix(genes=None, ontology="BP", similarity="lin", groupwise="bma", counter=..., distance_transform="auto")

Returns (gene_order, distance_matrix).

Distance transforms¶

The distance_transform parameter controls how similarity scores are converted to distances for clustering and embedding algorithms. Available options:

Transform	Formula	When to use
`auto`	depends on method	Recommended default. Automatically selects the best transform: `one_minus` for normalized methods (`lin`, `wang`, `simrel`, `topoicsim`), `max_minus` for unbounded methods (`resnik`, `jc`).
`one_minus`	`1 - sim`	Use for normalized methods that produce values in [0, 1]. Self-distance is 0, maximum distance is 1.
`max_minus`	`max(all_sims) - sim`	Use for unbounded methods (e.g., Resnik) where the range is not fixed. Produces distances relative to the highest observed similarity.
`reciprocal`	`1 / (1 + sim)`	Always valid regardless of method range, but produces a non-linear mapping. Useful as a fallback when other transforms are not appropriate.

Embedding APIs¶

These helpers build embeddings from precomputed GO similarity-derived distances.

`tsne_genes(genes, ontology, similarity, groupwise, counter, ...)`¶

Computes a t-SNE embedding from gene-level semantic similarity distances.

Key parameters:

perplexity (float) – Controls the balance between local and global structure. Higher values consider more neighbors per point. Must be less than the number of genes. A starting point is perplexity ~ sqrt(n_genes).
n_iter (int) – Number of optimization iterations. Default is typically 1000; 500 is often sufficient for exploration.
n_components (int) – Dimensionality of the output embedding (usually 2).
random_state (int) – Seed for reproducibility.
distance_transform (str) – How to convert similarity to distance (see above).

`umap_genes(genes, ontology, similarity, groupwise, counter, ...)`¶

Computes a UMAP embedding from gene-level semantic similarity distances.

Key parameters:

n_neighbors (int) – Number of nearest neighbors to consider when constructing the graph. Smaller values emphasize local structure; larger values capture more global patterns. Must be less than the number of genes. A starting point is n_neighbors ~ 15 for exploratory analysis.
min_dist (float) – Minimum distance between embedded points. Smaller values produce tighter clusters; larger values spread points more evenly.
random_state (int) – Seed for reproducibility.
distance_transform (str) – How to convert similarity to distance (see above).

`plot_embedding(embedding, ...)`¶

Creates a scatter plot from a 2D embedding array.

genes (list[str]) – Gene labels for each point.
annotate – Label display mode: "all" labels every point, "auto" labels only non-overlapping points, None disables labels.
title (str) – Plot title.
categories (list[str], optional) – Categorical labels for coloring points by group.

Returns (fig, ax) matplotlib objects.

`plot_tsne_genes(...)` / `plot_umap_genes(...)`¶

Convenience wrappers that combine embedding computation and plotting in one call. They accept the same parameters as tsne_genes / umap_genes plus the plotting parameters from plot_embedding.

Return (gene_order, embedding, fig, ax).

Minimal embedding example¶

genes = ["BRCA1", "CASP8", "TP53", "EGFR", "AKT1"]

genes, emb = go3.tsne_genes(
    genes,
    ontology="BP",
    similarity="lin",
    groupwise="bma",
    counter=counter,
    perplexity=2.0,
    random_state=42,
)

fig, ax = go3.plot_embedding(emb, genes=genes, annotate="auto", title="GO embedding")

API reference¶

compare_gene_pairs_batch(pairs, ontology, similarity, groupwise, counter)

Compute semantic similarity between genes in batches.

Parameters:

pairs (list of (str, str)) – List of pairs of genes to calculate the semantic similarity
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.

Returns:

List of similarity scores.

Return type:

list of float

Raises:

ValueError – If method or combine are unknown.

gene_distance_matrix(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto')

Compute a gene-to-gene distance matrix using GO semantic similarity.

Parameters:

genes (Optional[list[str]]) – List of genes to include. If None, uses all genes with annotations.
ontology (str) – Name of the subontology of GO to use: BP, MF or CC.
similarity (str) – Name of the similarity method.
groupwise (str) – Combination method to generate the similarities between genes. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.
distance_transform (str) – How to convert similarity to distance. Options: “auto”, “one_minus”, “reciprocal”, “max_minus”.

Returns:

Tuple with the gene order and a square distance matrix.

Return type:

(list[str], list[list[float]])

plot_embedding(embedding, genes=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None): Plot a 2D embedding with matplotlib.

plot_tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None): Compute t-SNE embeddings and plot them with matplotlib.

plot_umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None, labels=None, title=None, annotate='auto', max_labels=200, figsize=Ellipsis, s=18.0, alpha=0.85, ax=None): Compute UMAP embeddings and plot them with matplotlib.

set_num_threads(n_threads)

Configure the maximum number of threads rayon will use.

Parameters:: n_threads (int) – Number of threads to use. If 0, uses all available cores.

term_ic(go_id, counter)

Compute the Information Content (IC) of a GO term.

Parameters:

go_id (str) – GO term identifier.
counter (TermCounter) – Precomputed term counter with IC values.

Returns:

The IC of the GO term.

Return type:

float

termset_similarity(terms1, terms2, term_similarity='lin', groupwise='bma', counter=None)

Compute semantic similarity between two sets of GO terms.

Parameters:

terms1 (list of str) – First list of GO term IDs.
terms2 (list of str) – Second list of GO term IDs.
term_similarity (str) – Name of the pairwise similarity method.
groupwise (str) – Groupwise combination method. Options: “bma”, “max”, “avg”, “hausdorff”, “simgic”.
counter (TermCounter) – Precomputed IC values.

Returns:

Similarity score.

Return type:

float

tsne_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, perplexity=30.0, n_iter=1000, random_state=None): Compute t-SNE embeddings from a gene list using a precomputed distance matrix.

umap_genes(genes=None, ontology='BP', similarity='lin', groupwise='bma', counter=None, distance_transform='auto', n_components=2, n_neighbors=15, min_dist=0.1, random_state=None): Compute UMAP embeddings from a gene list using a precomputed distance matrix.

Utilities and Advanced APIs¶

Threading¶

IC lookup¶

Gene distance matrices¶

Distance transforms¶

Embedding APIs¶

tsne_genes(genes, ontology, similarity, groupwise, counter, ...)¶

umap_genes(genes, ontology, similarity, groupwise, counter, ...)¶

plot_embedding(embedding, ...)¶

plot_tsne_genes(...) / plot_umap_genes(...)¶

Minimal embedding example¶

API reference¶

`tsne_genes(genes, ontology, similarity, groupwise, counter, ...)`¶

`umap_genes(genes, ontology, similarity, groupwise, counter, ...)`¶

`plot_embedding(embedding, ...)`¶

`plot_tsne_genes(...)` / `plot_umap_genes(...)`¶