abtools.compare: Repertoire-level comparison

abtools._compare.aggregate(data)

Counts the number of occurances of each item in ‘data’.

Input data: a list of values.

Output a dict of bins and counts.

abtools._compare.mh_similarity(sample1, sample2)

Calculates the Marista-Horn similarity for two samples.

Parameters:
  • sample1 – list of frequencies for sample 1
  • sample2 – list of frequencies for sample 2
Returns:

Marista-Horn similarity (between 0 and 1)

Return type:

float

abtools._compare.kl_divergence(s1, s2)

Calculates the Kullback-Leibler divergence for two samples.

Parameters:
  • sample1 – probability distribution for sample 1
  • sample2 – probability distribution for sample 2
Returns:

Kullbeck-Leibler similarity

Return type:

float

abtools._compare.js_similarity(s1, s2)

Calculates the Jensen-Shannon similarity for two samples.

Parameters:
  • sample1 – probability distribution for sample 1
  • sample2 – probability distribution for sample 2
Returns:

Jensen-Shannon similarity (between 0 and 1)

Return type:

float

abtools._compare.shannon_entropy(prob_dist)

Calculates the Shannon entropy for a single probability distribution.

Parameters:prob_dist – probability distribution, must sum to 1
Returns:Shannon entropy
Return type:float
abtools._compare.jaccard_similarity(s1, s2)

Calculates the Jaccard similarity for two samples.

Parameters:
  • sample1 – list of frequencies for sample 1
  • sample2 – list of frequencies for sample 2
Returns:

Jaccard similarity (between 0 and 1)

Return type:

float

abtools._compare.renkonen_similarity(s1, s2)

Calculates the Renkonen similarity (also known as the percentage similarity) for two samples.

Parameters:
  • s1 – probability distribution for sample 1
  • s2 – probability distribution for sample 2
Returns:

Renkonen similarity (between 0 and 1)

Return type:

float

abtools._compare.bc_similarity(s1, s2)

Calculates the Bray-Curtis similarity for two samples.

Parameters:
  • s1 – probability distribution for sample 1
  • s2 – probability distribution for sample 2
Returns:

Bray-Curtis similarity (between 0 and 1)

Return type:

float

abtools._compare.cosine_similarity(s1, s2)

Calculates the cosine (angular) similarity for two samples.

Parameters:
  • s1 – list of frequencies for sample 1
  • s2 – list of frequencies for sample 2
Returns:

Cosine similarity (between 0 and 1)

Return type:

float

abtools._compare.sd_similarity(s1, s2)

Calculates the Brey-Curtis similarity for two samples.

Parameters:
  • s1 – list of frequencies for sample 1
  • s2 – list of frequencies for sample 2

Results:

float: Brey-Curtis similarity (between 0 and 1)
abtools._compare.run(**kwargs)

Performs repertoire-level comparison of antibody sequencing datasets.

Currently, the only metric for comparison is V-gene usage frequency. Additional measures are in the works (such as comparisons based on clonality).

Parameters:
  • db (str) – MongoDB database name.
  • collection1 (str) – Name of the first MongoDB collection to query for comparison. If both collection1 and collection2 are provided, collection1 will be compared only to collection2. If neither collection1 nor collection2 are provided, all collections in db will be processed iteratively (all pairwise comparisons will be made). If collection1 is provided but collection2 is not, collection1 will be iteratively compared to all other collections in db.
  • collection2 (str) – Name of the second MongoDB collection to query for comparison. If both collection1 and collection2 are provided, collection1 will be compared only to collection2. If neither collection1 nor collection2 are provided, all collections in db will be processed iteratively (all pairwise comparisons will be made).
  • collection_prefix (str) – All collections beginning with collection_prefix will be iteratively compared (all pairwise comparisons will be made).
  • ip (str) – IP address of the MongoDB server. Default is localhost.
  • port (int) – Port of the MongoDB server. Default is 27017.
  • user (str) – Username with which to connect to the MongoDB database. If either of user or password is not provided, the connection to the MongoDB database will be attempted without authentication.
  • password (str) – Password with which to connect to the MongoDB database. If either of user or password is not provided, the connection to the MongoDB database will be attempted without authentication.
  • chunksize (int) – Number of sequences for each iteration. Default is 100,000.
  • iterations (int) – Number of iterations to perform on each pair of samples. Default is 10,000
  • method (str) –

    Similarity/divergence method to used for comparison. Default is marisita-horn. Options are:

    • marisita-horn
    • kullback-leibler
    • jensen-shannon
    • jaccard
    • bray-curtis
    • renkonen
    • cosine
  • control_similarity (bool) – If True, control similarity/divergence will be calculated, in which each sample is also compared to itself. Default is False.
  • chain (str) – Antibody chain to be used for comparison. Options are heavy, kappa and lambda. Default is heavy.