Is There A Package In R Or Software That Can Help In Finding Independent Signals From A Gwas? (Similar To Snap)
4
2
Entering edit mode
10.8 years ago
lillo.sim ▴ 50

Hi,

I have found ~ 200 significant associations at a specific p-value threshold by running a GWAS between SNPs and a phenotype. I can map the significant SNPs to human genes, but I would like to find the independent signals associated with the phenotype, i.e. if the genes where the SNPs map to are nearby and there is LD between these genes, then I would like to consider the associations as a unique signal. I think this is a normal step to find the independent signals in post-GWAS processing.

My question is, is there a way to do this in R or any software, so retrieving the LD between a list of SNPs (something like BiomaRt) and then using this information to find independent signals, maybe creating LD clusters? I don't know if this is the usual way of finding independent signals in a GWAS, if not, could you tell me how this is normally done?

Thank you for any advice/help/suggestion!

EDIT by Michael: This boils down to the question if there is a tool like SNAP but for local installation.

gwas ld • 6.0k views
ADD COMMENT
0
Entering edit mode

What do you mean by "consider as a unique signal", do you mean to consider them jointly or giving them a compound score? I wouldn't call that independent, because they are in LD, it is quite the opposite.

ADD REPLY
0
Entering edit mode

Hi Micael, thanks for your reply. No I don't want a score, just the genes/signals found in thie GWAS that are independent, so for ex. if SNP1 maps to gene1 and SNP2 maps to genes2, but gene1 and gene2 are in LD, then there will be only one signal from this region… Isn't this how usually the number of independent associated loci are found?

ADD REPLY
0
Entering edit mode

Ok, I think I understand now. So, you wish to find if significant markers are in LD with each other given a cut-off for r2? LD is only measured for markers not for genes.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
10.8 years ago
Michael 54k

You can query pairwise LD for sets of markers using SNAP's pairwise LD query. This will retrieve a list of pairwise r2 values for your list of significant markers with the possibility of using different reference panels (e.g. 1000Genomes, HapMap) and populations. The markers absent from pairs or with r2 < threshold would then qualify for independence. Is that what you had in mind?

ADD COMMENT
0
Entering edit mode

Hi Michael, yes this is what I had in mind to find independent associations, thank you. I guess this is normally how scientists do this when they say for ex. there are #N of independent associations for a specific phenotype? The problem is that SNAP has a limit for the number of SNPs I can query, and I have more than 1,000 in other studies. The other problem is that I can only query based on rs ids, and I also have indels in my dataset. I also wanted this to be part of an automatic process, that is why I was hoping there was an equivalent R package to query the LD of SNPs and rare variants from the 1000genomes and a large numberof variants, and cluster them for example by LD. Do you or anybody know whether something like this exists? Thank you!

ADD REPLY
0
Entering edit mode

I didn't see a note about a limitation of the number of SNPs you can upload, also it has 1kG pilot 1 data, if that is sufficient. If not, we have done something similar (http://services.cbu.uib.no/software/ldsnpr/) but it is sort of 'the other way around'. Still, I could give you a HDF5 file with computed LD values for HapMap and 1kG. This file is in the wrong format though (organized by chromosome, no index on rsids) for fast search using rsids only, it needed to be loaded into a (SQL) database and the rsid columns needed to be indexed.

ADD REPLY
0
Entering edit mode

Thanks, that would be great…I get an error using SNAP when I try to load a file with >1000 SNPs. I have variants by chromosome and positions, not by rs id, since as I was saying there are many variants with no rsid, so searching by positions would be better. Where could I download this list from please?

ADD REPLY
0
Entering edit mode

I have currently a LD file generated for the EUR (meta) population only, it is 1.7GB. It is not a list though but in compressed binary HDF5 format. Before you try to work with it, I would like you to try the chr12 hapmap test-data to see if you can handle the format http://www.ii.uib.no/svn/eSysBio/Rpackages/LDsnpR/inst/extdata/ld_chrom12.h5. If you want a population like CEU, I would have to generate it first using a makefile. This will take about one week. Please let me know if that data format is ok for you, it contains also positional annotation for each pair.

I could also post the Makefile so you can try to build LD files yourself, that requires some additional software though.

ADD REPLY
0
Entering edit mode

thank you for your help… I am having trouble viewing the file in HDF5 format, maybe I can do this by following this old thread if there are no other new ways… 1000 genomes LD calculation

ADD REPLY
1
Entering edit mode

We have used Intersnp (http://www.ncbi.nlm.nih.gov/pubmed/19837719) for LD calculation instead of PLINK. I will post an update to the old question: A: 1000 genomes LD calculation.

ADD REPLY
0
Entering edit mode
10.8 years ago
lillo.sim ▴ 50

This package http://cran.r-project.org/web/packages/postgwas/postgwas.pdf can map the SNPs in LD with genes, but it does not give the LD between the SNPs, so if a SNP maps to/is in LD to genes that overlap then it still counts these genes as two different loci while it is in effect one.

ADD COMMENT
0
Entering edit mode
10.8 years ago
Bioch'Ti ★ 1.1k

Hi I think that the MLMM R/Python package (Segura et al., Nature Genetics, 2012) may be appropriate to answer your question. The principle is simple. The first round, a MLM screen the genome for association and in a second run, the model take the strongest associated loci as a cofactor to perform a new run of detection and so on until there is no genetic variance anymore. By this way, it avoids to detect all the SNP in LD with the 'true' associated SNP you are searching for. It is a robust and fast/efficient method. You should give it a try: http://www.nature.com/ng/journal/v44/n7/full/ng.2314.html?WT.ec_id=NG-201207

Best, C.

ADD COMMENT
0
Entering edit mode

Hi C, this looks like a really nice tool but I was asking about post-GWAS annotations after I have run the association analysis..

ADD REPLY
0
Entering edit mode
10.8 years ago
yao.h.1988 • 0

I think Haploview 4.2 can find LD block in your significant SNPs. Then you can use other method for example biomaRt to find the gene these LD blocks or independent SNPs

ADD COMMENT
0
Entering edit mode

Haploview is great to visualise an LD block, but I think I need genotype data to get the LD which I don't have. I only have summary information and want to find the significant independent signals...

ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6