Association analysis¶

class limmbo.core.gwas.GWAS(datainput, seed=10, verbose=True, searchDelta=False)[source]¶

computeEmpiricalP(pvalues, nrpermutations=1000)[source]¶

Compute empirical p-values: permute the genotypes, do the association test, record if permuted p-value of SNP is smaller than original p-value. Sum these occurrences and divide by total number of permutation.

Parameters:	pvalues (array-like) – [P x NrSNP] (single-trait) or [1 x NrSNP] (multi-trait) array of p-values nrpermutations (int) – number of permutations; 1/nrpermutations is the maximum level of significance (alpha)to test for, e.g. nrpermuations=100 -> alpha=0.01
Returns:	[P x NrSNP] (single-trait) or [1 x NrSNP] (multi-trait) array of empirical p-values
Return type:	(numpy array)

computeFDR(fdr)[source]¶

Create an empirical p-values distribution by permuting the genotypes and running the association test on these (10 random permuations). Recored the observed pvalues, order by rank and find the rank of the desired FDR to determine the empirical FDR.

Parameters:	fdr (float) – desired fdr threshold
Returns:	tuple containing: empirical fdr (float): empirical_pvalue_distribution (numpy array): array of empirical p-values for 10 permutations tests
Return type:	(tuple)

manhattanQQ(results, colourS='DarkBLue', colourNS='Orange', alphaS=0.5, alphaNS=0.1, thr_plotting=None, saveTo=None)[source]¶

Plot manhattan and quantile-quantile plot of association results.

Parameters:

results (dictionary) – dictionary generated via runAssociation analysis
colourS (string, optional) – colour of significant points
colourNS (string, optional) – colour of non-significant points
alphaS (float, optional) – plotting transparency of significant points
alphaNS (float, optional) – plotting transparency of non-significant points
thr_plotting (float, optional) – y-intercept for horizontal line as a marker for significance
saveTo (string, optional) – /path/to/output/directory to automatically save plot as pdf; needs user writing permission.

Returns:

(None)

runAssociationAnalysis(mode, setup='lmm', adjustSingleTrait=None)[source]¶

Analysing the association between phenotypes, genotypes, optional covariates and random genetic effects.

Parameters:

mode (string) – pecifies the type of linear model: either ‘multitrait’ for multivariate analysis or ‘singletrait’ for univariate analysis.
setup (string) – specifies the linear model setup: either ‘lmm’ for linear mixed model or ‘lm’ for a simple linear model.
adjustSingleTrait (string) – Method to adjust single-trait association p-values for testing multiple traits; If None (default) no adjusting. Options are ‘bonferroni’ (for bonferroni correction’) or ‘effective’ (for correcting for the effective number of tests as described in (Galwey,2009).

Returns:

dictionary containing:

lm (limix.qtl.LMM): LIMIX LMM object
pvalues (numpy array): [NrSNP x P] (when mode is singletrait) or [1 x`NrSNP`] array of p-values.
betas (numpy array): [NrSNP x P] array of effect size estimates per SNP across all traits.
pvalues_adjust (numpy array): only returned if mode is ‘singletrait’ and ‘adjustSingleTrait’ is not None; contains single-trait p-values adjusted for the number of single-trait analyses conducted.

Return type:

(dictionary)

Examples

>>> from pkg_resources import resource_filename
>>> from limmbo.io.reader import ReadData
>>> from limmbo.io.input import InputData
>>> from limmbo.core.gwas import GWAS
>>> data = ReadData(verbose=False)
>>> file_pheno = resource_filename('limmbo',
...                                'io/test/data/pheno.csv')
>>> file_geno = resource_filename('limmbo',
...                                'io/test/data/genotypes.csv')
>>> file_relatedness = resource_filename('limmbo',
...                     'io/test/data/relatedness.csv')
>>> file_covs = resource_filename('limmbo',
...                               'io/test/data/covs.csv')
>>> file_Cg = resource_filename('limmbo',
...                     'io/test/data/Cg.csv')
>>> file_Cn = resource_filename('limmbo',
...                     'io/test/data/Cn.csv')
>>> data.getPhenotypes(file_pheno=file_pheno)
>>> data.getCovariates(file_covariates=file_covs)
>>> data.getRelatedness(file_relatedness=file_relatedness)
>>> data.getGenotypes(file_genotypes=file_geno)
>>> data.getVarianceComponents(file_Cg=file_Cg,
...                            file_Cn=file_Cn)
>>> indata = InputData(verbose=False)
>>> indata.addPhenotypes(phenotypes = data.phenotypes)
>>> indata.addRelatedness(relatedness = data.relatedness)
>>> indata.addCovariates(covariates = data.covariates)
>>> indata.addGenotypes(genotypes=data.genotypes,
...                     genotypes_info=data.genotypes_info)
>>> indata.addVarianceComponents(Cg = data.Cg, Cn=data.Cn)
>>> indata.commonSamples()
>>> indata.regress()
>>> indata.transform(transform="scale")
>>> gwas = GWAS(datainput=indata, seed=10, verbose=False)
>>>
>>>
>>> # Example of multi-trait single-variant association testing
>>> # using a linear mixed model.
>>> resultsAssociation = gwas.runAssociationAnalysis(
...     setup="lmm", mode="multitrait")
>>> resultsAssociation.keys()
['lm', 'betas', 'pvalues']
>>> resultsAssociation['pvalues'].shape
(1, 20)
>>> '{:0.3e}'.format(resultsAssociation['pvalues'].min())
'4.584e-09'
>>> resultsAssociation['betas'].shape
(10, 20)
>>>
>>>
>>> # Example of single-trait single-variant association testing
>>> # using a linear mixed model.
>>> resultsAssociation = gwas.runAssociationAnalysis(
...     setup="lmm", mode="singletrait",
...     adjustSingleTrait = "effective")
>>> resultsAssociation.keys()
['pvalues_adjust', 'lm', 'betas', 'pvalues']
>>> resultsAssociation['pvalues'].shape
(10, 20)
>>> resultsAssociation['betas'].shape
(10, 20)
>>> '{:0.3e}'.format(resultsAssociation['pvalues_adjust'].min())
'2.262e-03'

saveAssociationResults(results, outdir, name='', pvalues_empirical=None)[source]¶

Saves results of association analyses.

Parameters:

results (dictionary) –
dictionary generated via runAssociation analysis, containing:
- lm (limix.qtl.LMM): LIMIX LMM object
- pvalues (numpy array): [P x NrSNP] array of p-values
- betas (numpy array): [P x NrSNP] array of effect size estimates per SNP across all traits
- pvalues_adjust (numpy array): only returned mode == singletrait and if ‘adjustSingleTrait’ is not None; contains single-trait p-values adjusted for the number of single-trait analyses conducted
outdir (string) – ‘/path/to/output/directory’; needs user writing permission
name (string) – input name specific name, such as chromosome used in analysis
pvalues_empirical (pandas dataframe, optional) – empirical pvalues via adjustSingleTrait

Returns:

(None)