abtools.compare: Repertoire-level comparison¶
-
abtools._compare.aggregate(data)¶ Counts the number of occurances of each item in ‘data’.
Input data: a list of values.
Output a dict of bins and counts.
-
abtools._compare.mh_similarity(sample1, sample2)¶ Calculates the Marista-Horn similarity for two samples.
Parameters: - sample1 – list of frequencies for sample 1
- sample2 – list of frequencies for sample 2
Returns: Marista-Horn similarity (between 0 and 1)
Return type: float
-
abtools._compare.kl_divergence(s1, s2)¶ Calculates the Kullback-Leibler divergence for two samples.
Parameters: - sample1 – probability distribution for sample 1
- sample2 – probability distribution for sample 2
Returns: Kullbeck-Leibler similarity
Return type: float
-
abtools._compare.js_similarity(s1, s2)¶ Calculates the Jensen-Shannon similarity for two samples.
Parameters: - sample1 – probability distribution for sample 1
- sample2 – probability distribution for sample 2
Returns: Jensen-Shannon similarity (between 0 and 1)
Return type: float
-
abtools._compare.shannon_entropy(prob_dist)¶ Calculates the Shannon entropy for a single probability distribution.
Parameters: prob_dist – probability distribution, must sum to 1 Returns: Shannon entropy Return type: float
-
abtools._compare.jaccard_similarity(s1, s2)¶ Calculates the Jaccard similarity for two samples.
Parameters: - sample1 – list of frequencies for sample 1
- sample2 – list of frequencies for sample 2
Returns: Jaccard similarity (between 0 and 1)
Return type: float
-
abtools._compare.renkonen_similarity(s1, s2)¶ Calculates the Renkonen similarity (also known as the percentage similarity) for two samples.
Parameters: - s1 – probability distribution for sample 1
- s2 – probability distribution for sample 2
Returns: Renkonen similarity (between 0 and 1)
Return type: float
-
abtools._compare.bc_similarity(s1, s2)¶ Calculates the Bray-Curtis similarity for two samples.
Parameters: - s1 – probability distribution for sample 1
- s2 – probability distribution for sample 2
Returns: Bray-Curtis similarity (between 0 and 1)
Return type: float
-
abtools._compare.cosine_similarity(s1, s2)¶ Calculates the cosine (angular) similarity for two samples.
Parameters: - s1 – list of frequencies for sample 1
- s2 – list of frequencies for sample 2
Returns: Cosine similarity (between 0 and 1)
Return type: float
-
abtools._compare.sd_similarity(s1, s2)¶ Calculates the Brey-Curtis similarity for two samples.
Parameters: - s1 – list of frequencies for sample 1
- s2 – list of frequencies for sample 2
Results:
float: Brey-Curtis similarity (between 0 and 1)
-
abtools._compare.run(**kwargs)¶ Performs repertoire-level comparison of antibody sequencing datasets.
Currently, the only metric for comparison is V-gene usage frequency. Additional measures are in the works (such as comparisons based on clonality).
Parameters: - db (str) – MongoDB database name.
- collection1 (str) – Name of the first MongoDB collection to query for comparison.
If both
collection1andcollection2are provided,collection1will be compared only tocollection2. If neithercollection1norcollection2are provided, all collections indbwill be processed iteratively (all pairwise comparisons will be made). Ifcollection1is provided butcollection2is not,collection1will be iteratively compared to all other collections indb. - collection2 (str) – Name of the second MongoDB collection to query for comparison.
If both
collection1andcollection2are provided,collection1will be compared only tocollection2. If neithercollection1norcollection2are provided, all collections indbwill be processed iteratively (all pairwise comparisons will be made). - collection_prefix (str) – All collections beginning with
collection_prefixwill be iteratively compared (all pairwise comparisons will be made). - ip (str) – IP address of the MongoDB server. Default is
localhost. - port (int) – Port of the MongoDB server. Default is
27017. - user (str) – Username with which to connect to the MongoDB database. If either
of
userorpasswordis not provided, the connection to the MongoDB database will be attempted without authentication. - password (str) – Password with which to connect to the MongoDB database. If either
of
userorpasswordis not provided, the connection to the MongoDB database will be attempted without authentication. - chunksize (int) – Number of sequences for each iteration. Default is 100,000.
- iterations (int) – Number of iterations to perform on each pair of samples. Default is 10,000
- method (str) –
Similarity/divergence method to used for comparison. Default is
marisita-horn. Options are:marisita-hornkullback-leiblerjensen-shannonjaccardbray-curtisrenkonencosine
- control_similarity (bool) – If
True, control similarity/divergence will be calculated, in which each sample is also compared to itself. Default isFalse. - chain (str) – Antibody chain to be used for comparison. Options are
heavy,kappaandlambda. Default isheavy.