API reference
Processing VCF
- class dismal.callset.CallSet(vcf_path: str = None, npz_path: str = None)
Class to represent a skallel callset (i.e. a VCF)
- __init__(vcf_path: str = None, npz_path: str = None) None
Represent VCF as skallel callset
- Parameters
vcf_path (str, optional) – Path to VCF, defaults to None
npz_path (str, optional) – Output path, defaults to None
Making blocks
- dismal.blocking.blocklen_using_dxy(callset: CallSet, sample_map: dict, numerator: int = 3) float
Rule-of-thumb blocklength estimation using 3/dxy rule.
- Parameters
callset (CallSet) – Call data.
sample_map (dict) – Sample:population map.
numerator (int, optional) – Desired mean of between-distribution, defaults to 3
- Returns
Recommended blocklength.
- Return type
float
- dismal.blocking.make_blocks(block_size: int, annotation: pyranges.pyranges_main.PyRanges | str = None, callable_sites: pyranges.pyranges_main.PyRanges | str = None, features: list[str] = None, trim_start: int = 10, trim_end: int = 10) PyRanges
Make blocks with respect to an annotation and/or callable sites.
- Parameters
block_size (int) – Desired length of genomic blocks.
annotation (PyRanges | str, optional) – Genome annotation, defaults to None
callable_sites (PyRanges | str, optional) – Callable sites, defaults to None
features (list[str], optional) – Features to subset for. Will be removed in a future version., defaults to None
trim_start (int, optional) – Length of sequence to trim, defaults to 10
trim_end (int, optional) – Length of sequence to trim, defaults to 10
- Raises
ValueError – If not at least one of callable_sites or annotation are provided.
- Returns
Blocks
- Return type
PyRanges
- dismal.blocking.make_random_blocks(callset: CallSet, block_size: int, chrom_sizes: dict = None, blocks_per_pair: int = 1000) PyRanges
Generate random blocks without considering annotation or coverage (e.g. if data is pre-filtered, or simulated).
- Parameters
callset (CallSet) – Call data.
block_size (int) – Size of genomic blocks.
chrom_sizes (dict, optional) – Chromosome sizes, defaults to None in which case first and last positions are used.
blocks_per_pair (int, optional) – Number of blocks to make per pair of individuals, defaults to 1000
- Returns
Block coordinates
- Return type
PyRanges
- dismal.blocking.segregating_sites_distribution(blocks: pyranges.pyranges_main.PyRanges | str, sample_map: dict, save_blocks_bed: str = 'blocks_with_state.bed', save_distr_npz: str = 's_distr.npz') tuple[numpy.array]
Compute segregating sites distributions within and between populations.
- Parameters
blocks (PyRanges | str) – Blocks
sample_map (dict) – Map sample:population
save_blocks_bed (str, optional) – Path to save blocks to, defaults to “blocks_with_state.bed”
save_distr_npz (str, optional) – Path to save distributions to, defaults to “s_distr.npz”
- Returns
Distributions
- Return type
tuple[np.array]
Using pre-defined models
- dismal.models.gim(sampled_deme_names=None, asymmetric_migration=True)
Create three-epoch GIM model (allow migration post-split) using default names
- dismal.models.iim(sampled_deme_names=None, asymmetric_migration=True)
Create three-epoch isolation-with-initial-migration model (migration only in middle epoch) with default names
- dismal.models.im(sampled_deme_names=None, asymmetric_migration=True)
Create two-epoch isolation-with-migration model with default names.
- dismal.models.iso_three_epoch(sampled_deme_names=None)
Create three-epoch isolation model (no migration) with default names
- dismal.models.iso_two_epoch(sampled_deme_names=None)
Create two-epoch isolation model with default names
- dismal.models.secondary_contact(sampled_deme_names=None, asymmetric_migration=True)
Create three-epoch secondary contact model (migration only in most recent epoch) with default names
Automatically fitting multiple models with MultiModel
- class dismal.multimodel.MultiModel(s1: list[int], s2: list[int], s3: list[int], sampled_deme_names: tuple[str], max_epochs: int = 3, threads: int = 1)
- __init__(s1: list[int], s2: list[int], s3: list[int], sampled_deme_names: tuple[str], max_epochs: int = 3, threads: int = 1) None
Class to fit and represent multiple models fitted on the same data.
- Parameters
s1 (list[int]) – Distribution of segregating sites within population 1.
s2 (list[int]) – Distribution of segregating sites within population 2.
s3 (list[int]) – Distribution of segregating sites between populations.
sampled_deme_names (tuple[str]) – Names of sampled (current) populations.
max_epochs (int, optional) – Maximum number of epochs (including ancestral population) to consider, defaults to 3
threads (int, optional) – Number of threads to use; 0 = all, -1 = all but one, etc, defaults to 1
- static likelihood_ratio_test(null_mod: DemographicModel, alt_mod: DemographicModel, alpha: float = 0.05, verbose: bool = True) tuple[float]
Likelihood ratio test between two fitted models. Does not adjust for composite likelihood non-independence.
- Parameters
null_mod (DemographicModel) – Null model - must be nested within alternate model.
alt_mod (DemographicModel) – Model to test against null.
alpha (float, optional) – Alpha parameter for significance, defaults to 0.05
verbose (bool, optional) – Verbosity, defaults to True
- Returns
(likelihood ratio, p-value)
- Return type
tuple[float]
Specifying custom models
- class dismal.demographicmodel.DemographicModel(model_ref: str = None)
- __init__(model_ref: str = None) None
Represent single demographic model.
- Parameters
model_ref (str, optional) – Model name, defaults to None
- add_epoch(n_demes: int, migration: bool, deme_ids: list[tuple[str]] = None, asymmetric_migration: bool = True, migration_direction: list[tuple[str]] = None) None
Add an epoch
- Parameters
n_demes (int) – Number of demes in epoch
migration (bool) – Whether to allow migration
deme_ids (list[tuple[str]], optional) – Deme names, defaults to None
asymmetric_migration (bool, optional) – Whether to allow asymmetric migration, defaults to True
migration_direction (list[tuple[str]], optional) – Direction of migration e.g. (“A”, “B”) for backwards-in-time migration A->B, defaults to None
- bootstrap_mle(mutation_rate: float, blocklen: int, recombination_rate: float = 0, n_bootstraps: int = 100) array
_summary_
- Parameters
mutation_rate (float) – Mutation rate
recombination_rate (float) – Recombination rate (note that original model was fitted assuming no recombination)
blocklen (int) – Block length
n_bootstraps (int, optional) – Number of bootstrap replicates, defaults to 100
- Returns
Bootstrap values
- Return type
np.array
- demes_format(mutation_rate, blocklen, log_time=True)
Represent model in Demes format - to be removed in next refactor
- demesdraw(mutation_rate=None, blocklen=None, log_time=True)
Draw Demes model; convenience function for demes_format().drawing - to be removed in next refactor
- fit_model(s1: array, s2: array, s3: array, initial_values: list = None, bounds: list[tuple] = None, optimisers: list = None) None
Fit model to estimate parameters
- Parameters
s1 (np.array) – Segregating sites distribution within population 1
s2 (np.array) – Segregating sites distribution within population 2
s3 (np.array) – Segregating sites distribution between populations
initial_values (list, optional) – Initial values for optimisation, defaults to None in which case defaults are used
bounds (list[tuple], optional) – Bounds for parameters (low, high), defaults to None
optimisers (list, optional) – Optimisation algorithms to use, defaults to None in which case L-BFGS-B and Nelder-Mead are tried sequentially
- Raises
RuntimeError – If all optimisers fail
- kldiv_fitted_observed()
Evaluate model fit (KL-divergence) against observed data
- kldiv_fitted_true(true_modinst: ModelInstance, s_max: int = 500)
Evaluate model fit (KL-divergence) against specified parameter set ModelInstance