Association analysis

class limmbo.core.gwas.GWAS(datainput, seed=10, verbose=True, searchDelta=False)[source]
computeEmpiricalP(pvalues, nrpermutations=1000)[source]

Compute empirical p-values: permute the genotypes, do the association test, record if permuted p-value of SNP is smaller than original p-value. Sum these occurrences and divide by total number of permutation.

Parameters:
  • pvalues (array-like) – [P x NrSNP] (single-trait) or [1 x NrSNP] (multi-trait) array of p-values
  • nrpermutations (int) – number of permutations; 1/nrpermutations is the maximum level of significance (alpha)to test for, e.g. nrpermuations=100 -> alpha=0.01
Returns:

[P x NrSNP] (single-trait) or [1 x NrSNP] (multi-trait) array of empirical p-values

Return type:

(numpy array)

computeFDR(fdr)[source]

Create an empirical p-values distribution by permuting the genotypes and running the association test on these (10 random permuations). Recored the observed pvalues, order by rank and find the rank of the desired FDR to determine the empirical FDR.

Parameters:fdr (float) – desired fdr threshold
Returns:tuple containing:
  • empirical fdr (float):
  • empirical_pvalue_distribution (numpy array): array of empirical p-values for 10 permutations tests
Return type:(tuple)
manhattanQQ(results, colourS='DarkBLue', colourNS='Orange', alphaS=0.5, alphaNS=0.1, thr_plotting=None, saveTo=None)[source]

Plot manhattan and quantile-quantile plot of association results.

Parameters:
  • results (dictionary) – dictionary generated via runAssociation analysis
  • colourS (string, optional) – colour of significant points
  • colourNS (string, optional) – colour of non-significant points
  • alphaS (float, optional) – plotting transparency of significant points
  • alphaNS (float, optional) – plotting transparency of non-significant points
  • thr_plotting (float, optional) – y-intercept for horizontal line as a marker for significance
  • saveTo (string, optional) – /path/to/output/directory to automatically save plot as pdf; needs user writing permission.
Returns:

(None)

runAssociationAnalysis(mode, setup='lmm', adjustSingleTrait=None)[source]

Analysing the association between phenotypes, genotypes, optional covariates and random genetic effects.

Parameters:
  • mode (string) – pecifies the type of linear model: either ‘multitrait’ for multivariate analysis or ‘singletrait’ for univariate analysis.
  • setup (string) – specifies the linear model setup: either ‘lmm’ for linear mixed model or ‘lm’ for a simple linear model.
  • adjustSingleTrait (string) – Method to adjust single-trait association p-values for testing multiple traits; If None (default) no adjusting. Options are ‘bonferroni’ (for bonferroni correction’) or ‘effective’ (for correcting for the effective number of tests as described in (Galwey,2009).
Returns:

dictionary containing:

  • lm (limix.qtl.LMM): LIMIX LMM object
  • pvalues (numpy array): [NrSNP x P] (when mode is singletrait) or [1 x`NrSNP`] array of p-values.
  • betas (numpy array): [NrSNP x P] array of effect size estimates per SNP across all traits.
  • pvalues_adjust (numpy array): only returned if mode is ‘singletrait’ and ‘adjustSingleTrait’ is not None; contains single-trait p-values adjusted for the number of single-trait analyses conducted.

Return type:

(dictionary)

Examples

>>> from pkg_resources import resource_filename
>>> from limmbo.io.reader import ReadData
>>> from limmbo.io.input import InputData
>>> from limmbo.core.gwas import GWAS
>>> data = ReadData(verbose=False)
>>> file_pheno = resource_filename('limmbo',
...                                'io/test/data/pheno.csv')
>>> file_geno = resource_filename('limmbo',
...                                'io/test/data/genotypes.csv')
>>> file_relatedness = resource_filename('limmbo',
...                     'io/test/data/relatedness.csv')
>>> file_covs = resource_filename('limmbo',
...                               'io/test/data/covs.csv')
>>> file_Cg = resource_filename('limmbo',
...                     'io/test/data/Cg.csv')
>>> file_Cn = resource_filename('limmbo',
...                     'io/test/data/Cn.csv')
>>> data.getPhenotypes(file_pheno=file_pheno)
>>> data.getCovariates(file_covariates=file_covs)
>>> data.getRelatedness(file_relatedness=file_relatedness)
>>> data.getGenotypes(file_genotypes=file_geno)
>>> data.getVarianceComponents(file_Cg=file_Cg,
...                            file_Cn=file_Cn)
>>> indata = InputData(verbose=False)
>>> indata.addPhenotypes(phenotypes = data.phenotypes)
>>> indata.addRelatedness(relatedness = data.relatedness)
>>> indata.addCovariates(covariates = data.covariates)
>>> indata.addGenotypes(genotypes=data.genotypes,
...                     genotypes_info=data.genotypes_info)
>>> indata.addVarianceComponents(Cg = data.Cg, Cn=data.Cn)
>>> indata.commonSamples()
>>> indata.regress()
>>> indata.transform(transform="scale")
>>> gwas = GWAS(datainput=indata, seed=10, verbose=False)
>>>
>>>
>>> # Example of multi-trait single-variant association testing
>>> # using a linear mixed model.
>>> resultsAssociation = gwas.runAssociationAnalysis(
...     setup="lmm", mode="multitrait")
>>> resultsAssociation.keys()
['lm', 'betas', 'pvalues']
>>> resultsAssociation['pvalues'].shape
(1, 20)
>>> '{:0.3e}'.format(resultsAssociation['pvalues'].min())
'4.584e-09'
>>> resultsAssociation['betas'].shape
(10, 20)
>>>
>>>
>>> # Example of single-trait single-variant association testing
>>> # using a linear mixed model.
>>> resultsAssociation = gwas.runAssociationAnalysis(
...     setup="lmm", mode="singletrait",
...     adjustSingleTrait = "effective")
>>> resultsAssociation.keys()
['pvalues_adjust', 'lm', 'betas', 'pvalues']
>>> resultsAssociation['pvalues'].shape
(10, 20)
>>> resultsAssociation['betas'].shape
(10, 20)
>>> '{:0.3e}'.format(resultsAssociation['pvalues_adjust'].min())
'2.262e-03'
saveAssociationResults(results, outdir, name='', pvalues_empirical=None)[source]

Saves results of association analyses.

Parameters:
  • results (dictionary) –

    dictionary generated via runAssociation analysis, containing:

    • lm (limix.qtl.LMM): LIMIX LMM object
    • pvalues (numpy array): [P x NrSNP] array of p-values
    • betas (numpy array): [P x NrSNP] array of effect size estimates per SNP across all traits
    • pvalues_adjust (numpy array): only returned mode == singletrait and if ‘adjustSingleTrait’ is not None; contains single-trait p-values adjusted for the number of single-trait analyses conducted
  • outdir (string) – ‘/path/to/output/directory’; needs user writing permission
  • name (string) – input name specific name, such as chromosome used in analysis
  • pvalues_empirical (pandas dataframe, optional) – empirical pvalues via adjustSingleTrait
Returns:

(None)