API reference

Processing VCF

class dismal.callset.CallSet(vcf_path: str = None, npz_path: str = None)

Class to represent a skallel callset (i.e. a VCF)

__init__(vcf_path: str = None, npz_path: str = None) → None

Represent VCF as skallel callset

Parameters

vcf_path (str, optional) – Path to VCF, defaults to None
npz_path (str, optional) – Output path, defaults to None

Making blocks

dismal.blocking.blocklen_using_dxy(callset: CallSet, sample_map: dict, numerator: int = 3) → float

Rule-of-thumb blocklength estimation using 3/dxy rule.

Parameters

callset (CallSet) – Call data.
sample_map (dict) – Sample:population map.
numerator (int, optional) – Desired mean of between-distribution, defaults to 3

Returns

Recommended blocklength.

Return type

float

dismal.blocking.make_blocks(block_size: int, annotation: pyranges.pyranges_main.PyRanges | str = None, callable_sites: pyranges.pyranges_main.PyRanges | str = None, features: list[str] = None, trim_start: int = 10, trim_end: int = 10) → PyRanges

Make blocks with respect to an annotation and/or callable sites.

Parameters

block_size (int) – Desired length of genomic blocks.
annotation (PyRanges | str, optional) – Genome annotation, defaults to None
callable_sites (PyRanges | str, optional) – Callable sites, defaults to None
features (list[str], optional) – Features to subset for. Will be removed in a future version., defaults to None
trim_start (int, optional) – Length of sequence to trim, defaults to 10
trim_end (int, optional) – Length of sequence to trim, defaults to 10

Raises

ValueError – If not at least one of callable_sites or annotation are provided.

Returns

Blocks

Return type

PyRanges

dismal.blocking.make_random_blocks(callset: CallSet, block_size: int, chrom_sizes: dict = None, blocks_per_pair: int = 1000) → PyRanges

Generate random blocks without considering annotation or coverage (e.g. if data is pre-filtered, or simulated).

Parameters

callset (CallSet) – Call data.
block_size (int) – Size of genomic blocks.
chrom_sizes (dict, optional) – Chromosome sizes, defaults to None in which case first and last positions are used.
blocks_per_pair (int, optional) – Number of blocks to make per pair of individuals, defaults to 1000

Returns

Block coordinates

Return type

PyRanges

dismal.blocking.segregating_sites_distribution(blocks: pyranges.pyranges_main.PyRanges | str, sample_map: dict, save_blocks_bed: str = 'blocks_with_state.bed', save_distr_npz: str = 's_distr.npz') → tuple[numpy.array]

Compute segregating sites distributions within and between populations.

Parameters

blocks (PyRanges | str) – Blocks
sample_map (dict) – Map sample:population
save_blocks_bed (str, optional) – Path to save blocks to, defaults to “blocks_with_state.bed”
save_distr_npz (str, optional) – Path to save distributions to, defaults to “s_distr.npz”

Returns

Distributions

Return type

tuple[np.array]

Using pre-defined models

dismal.models.gim(sampled_deme_names=None, asymmetric_migration=True): Create three-epoch GIM model (allow migration post-split) using default names

dismal.models.iim(sampled_deme_names=None, asymmetric_migration=True): Create three-epoch isolation-with-initial-migration model (migration only in middle epoch) with default names

dismal.models.im(sampled_deme_names=None, asymmetric_migration=True): Create two-epoch isolation-with-migration model with default names.

dismal.models.iso_three_epoch(sampled_deme_names=None): Create three-epoch isolation model (no migration) with default names

dismal.models.iso_two_epoch(sampled_deme_names=None): Create two-epoch isolation model with default names

dismal.models.secondary_contact(sampled_deme_names=None, asymmetric_migration=True): Create three-epoch secondary contact model (migration only in most recent epoch) with default names

Automatically fitting multiple models with MultiModel

class dismal.multimodel.MultiModel(s1: list[int], s2: list[int], s3: list[int], sampled_deme_names: tuple[str], max_epochs: int = 3, threads: int = 1)

__init__(s1: list[int], s2: list[int], s3: list[int], sampled_deme_names: tuple[str], max_epochs: int = 3, threads: int = 1) → None

Class to fit and represent multiple models fitted on the same data.

Parameters

s1 (list[int]) – Distribution of segregating sites within population 1.
s2 (list[int]) – Distribution of segregating sites within population 2.
s3 (list[int]) – Distribution of segregating sites between populations.
sampled_deme_names (tuple[str]) – Names of sampled (current) populations.
max_epochs (int, optional) – Maximum number of epochs (including ancestral population) to consider, defaults to 3
threads (int, optional) – Number of threads to use; 0 = all, -1 = all but one, etc, defaults to 1

static likelihood_ratio_test(null_mod: DemographicModel, alt_mod: DemographicModel, alpha: float = 0.05, verbose: bool = True) → tuple[float]

Likelihood ratio test between two fitted models. Does not adjust for composite likelihood non-independence.

Parameters

null_mod (DemographicModel) – Null model - must be nested within alternate model.
alt_mod (DemographicModel) – Model to test against null.
alpha (float, optional) – Alpha parameter for significance, defaults to 0.05
verbose (bool, optional) – Verbosity, defaults to True

Returns

(likelihood ratio, p-value)

Return type

tuple[float]

Specifying custom models

class dismal.demographicmodel.DemographicModel(model_ref: str = None)

__init__(model_ref: str = None) → None

Represent single demographic model.

Parameters: model_ref (str, optional) – Model name, defaults to None

add_epoch(n_demes: int, migration: bool, deme_ids: list[tuple[str]] = None, asymmetric_migration: bool = True, migration_direction: list[tuple[str]] = None) → None

Add an epoch

Parameters

n_demes (int) – Number of demes in epoch
migration (bool) – Whether to allow migration
deme_ids (list[tuple[str]], optional) – Deme names, defaults to None
asymmetric_migration (bool, optional) – Whether to allow asymmetric migration, defaults to True
migration_direction (list[tuple[str]], optional) – Direction of migration e.g. (“A”, “B”) for backwards-in-time migration A->B, defaults to None

bootstrap_mle(mutation_rate: float, blocklen: int, recombination_rate: float = 0, n_bootstraps: int = 100) → array

_summary_

Parameters

mutation_rate (float) – Mutation rate
recombination_rate (float) – Recombination rate (note that original model was fitted assuming no recombination)
blocklen (int) – Block length
n_bootstraps (int, optional) – Number of bootstrap replicates, defaults to 100

Returns

Bootstrap values

Return type

np.array

demes_format(mutation_rate, blocklen, log_time=True): Represent model in Demes format - to be removed in next refactor

demesdraw(mutation_rate=None, blocklen=None, log_time=True): Draw Demes model; convenience function for demes_format().drawing - to be removed in next refactor

fit_model(s1: array, s2: array, s3: array, initial_values: list = None, bounds: list[tuple] = None, optimisers: list = None) → None

Fit model to estimate parameters

Parameters

s1 (np.array) – Segregating sites distribution within population 1
s2 (np.array) – Segregating sites distribution within population 2
s3 (np.array) – Segregating sites distribution between populations
initial_values (list, optional) – Initial values for optimisation, defaults to None in which case defaults are used
bounds (list[tuple], optional) – Bounds for parameters (low, high), defaults to None
optimisers (list, optional) – Optimisation algorithms to use, defaults to None in which case L-BFGS-B and Nelder-Mead are tried sequentially

Raises

RuntimeError – If all optimisers fail

kldiv_fitted_observed(): Evaluate model fit (KL-divergence) against observed data

kldiv_fitted_true(true_modinst: ModelInstance, s_max: int = 500): Evaluate model fit (KL-divergence) against specified parameter set ModelInstance