Variance decomposition

LiMMBo

class limmbo.core.vdbootstrap.LiMMBo(datainput, S, timing=False, iterations=10, verbose=False)[source]
combineBootstrap(results)[source]

Combine the [S x S] subset covariance matrices to find the overall [P x P] covariance matrices Cg and Cn.

Parameters:results (list) – results of runBootstrapVarianceDecomposition()
Returns:dictionary containing:
  • Cg_fit (numpy.array): [P x P] genetic covariance matrix via fitting
  • Cn_fit (numpy.array): [P x P] noise covariance matrix via fitting
  • Cg_average (numpy.array): [P x P] genetic covariance matrix via simple average
  • Cn_average (numpy.array): [P x P] noise covariance matrix via simple average
  • Cg_all_bs (numpy.array): [runs x S x S] genetic covariance matrices of runs phenotype subsets of size S
  • Cn_all_bs (numpy.array): [runs x S x S] noise covariance matrices of runs phenotype subsets of size S
  • proc_time_ind_bs (list): individual run times for all variance decomposition runs of [S x S] Cg and Cn
  • proc_time_sum_ind_bs (list): sum of individual run times for all variance decomposition runs of [S x S] Cg and Cn
  • proc_time_combine_bs (list): run time for finding [P x P] trait covariance estimates from fitting [S x S] bootstrap covariance estimates
  • nr_of_bs (int): number of bootstrap runs
  • nr_of_successful_bs (int): total number of successful bootstrapping runs i.e. variance decomposition converged
  • results_fit_Cg (): results parameters of the bfgs-fit of the genetic covariance matrices (via scipy.optimize.fmin_l_bfgs_g)
  • results_fit_Cn (): results parameters of the bfgs-fit of the noise covariance matrices (via scipy.optimize.fmin_l_bfgs_g)
Return type:(dictionary)
runBootstrapCovarianceEstimation(seed, cpus, minCooccurrence=3, n=None)[source]

Distribute variance decomposition of subset matrices via pp

Parameters:
  • seed (int) – seed to initialise random number generator for bootstrapping
  • minCooccurrence (int) – minimum number a trait pair should be sampled; once reached for all trait pairs, sampling is stopped if n is None; default=3
  • n (int) – if not None, sets the total number of permutations, otherwise n determined by minCooccurrence; default: None
  • cpus (int) – number of cpus available for covariance estimation
Returns:

list containing variance components for all bootstrap runs

Return type:

(list)

saveVarianceComponents(resultsCombineBootstrap, output, intermediate=True)[source]

Save variance components as comma-separated files or python objects (via Cpickle).

Parameters:
  • resultsCombineBootstrap (dictionary) –
    • Cg_fit (numpy.array): [P x P] genetic covariance matrix via fitting
    • Cn_fit (numpy.array): [P x P] noise covariance matrix via fitting
    • Cg_average (numpy.array): [P x P] genetic covariance matrix via simple average
    • Cn_average (numpy.array): [P x P] noise covariance matrix via simple average
    • Cg_all_bs (numpy.array): [runs x S x S] genetic covariance matrices of runs phenotype subsets of size S
    • Cn_all_bs (numpy.array): [runs x S x S] noise covariance matrices of runs phenotype subsets of size S
    • proc_time_ind_bs (list): individual run times for all variance decomposition runs of [S x S] Cg and Cn
    • proc_time_sum_ind_bs (list): sum of individual run times for all variance decomposition runs of [S x S] Cg and Cn
    • proc_time_combine_bs (list): run time for finding [P x P] trait covariance estimates from fitting [S x S] bootstrap covariance estimates
    • nr_of_bs (int): number of bootstrap runs
    • nr_of_successful_bs (int): total number of successful bootstrapping runs i.e. variance decomposition converged
    • results_fit_Cg (): results parameters of the bfgs-fit of the genetic covariance matrices (via scipy.optimize.fmin_l_bfgs_g)
    • results_fit_Cn (): results parameters of the bfgs-fit of the noise covariance matrices (via scipy.optimize.fmin_l_bfgs_g)
  • output (string) – path/to/directory where variance components will be saved; needs writing permission
  • intermediate (bool) – if set to True, intermediate variance components (average covariance matrices, bootstrap matrices and results parameter of BFGS fit) are saved
Returns:

None

Standard REML

limmbo.core.vdsimple.vd_reml(datainput, iterations=10, verbose=True)[source]

Compute variance decomposition of phenotypes into genetic and noise covariance via standard REML: approach implemented in LIMIX

Parameters:
  • datainput (InputData) – object with ID-matched [N x P] phenotypes with [N] individuals and [P] phenotypes and [N x N] relatedness estimates
  • output (string, optional) – output directory with user-writing permissions; needed if caching is True
  • cache (bool, optional) – should results be cached
  • verbose (bool, optional) – should messages be printed to stdout
Returns:

tuple containing:

  • Cg (numpy.array): [P x P] genetic variance component
  • Cn (numpy.array): [P x P] noise variance component
  • process_time (double): cpu time of variance decomposition

Return type:

(tuple)

Examples:

>>> import numpy
>>> from numpy.random import RandomState
>>> from numpy.linalg import cholesky as chol
>>> from limmbo.core.vdsimple import vd_reml
>>> from limmbo.io.input import InputData
>>> random = RandomState(10)
>>> N = 100
>>> S = 1000
>>> P = 5
>>> snps = (random.rand(N, S) < 0.2).astype(float)
>>> kinship = numpy.dot(snps, snps.T) / float(10)
>>> y  = random.randn(N, P)
>>> pheno = numpy.dot(chol(kinship), y)
>>> pheno_ID = [ 'PID{}'.format(x+1) for x in range(P)]
>>> samples = [ 'SID{}'.format(x+1) for x in range(N)]
    >>> datainput = InputData()
>>> datainput.addPhenotypes(phenotypes = pheno,
...                         phenotype_ID = pheno_ID,
...                         pheno_samples = samples)
>>> datainput.addRelatedness(relatedness = kinship,
...                          relatedness_samples = samples)
>>> Cg, Cn, ptime = vd_reml(datainput, verbose=False)
>>> Cg.shape
(5, 5)