Question: Assign Intergenic Snps To Gene.
5.3 years ago
wrote:


This is a trivial question and I am surprised I could not find it anywhere else...

How can you map an intergenic SNP to a unique gene? I mean, is there any "canonical good way" to do it? Several ways to do it come to my mind: the closest in distance, the one with the higher R² value between the SNP and the first gene polymorphic position, etc... But I would like to know if any previous reference of someone doing it somehow do actually exist. GWAS usually report both genes (upstream & downstream), but I'd rather like to have a single hit.

Does anyone know a good way to do it?

mapping snp • 2.8k views
modified 4.6 years ago by Biostar ♦♦ 20 • written 5.3 years ago by Peixe530

I don't have time to write a full answer now, but have a look at this paper: Habegger et al, 2012. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.

written 5.3 years ago by Giovanni M Dall'Olio26k

In this recent paper Raj et al, 2013, Common Risk Alleles for Inflammatory Diseases Are Targets of Recent Positive Selection they assigned SNPs to the closest gene inside the LD block.

written 5.3 years ago by Giovanni M Dall'Olio26k

I know... & you know we are doing a journal club on it soon... ;) hehehe...

written 5.3 years ago by Peixe530
5.3 years ago
Boston, MA USA
wrote:

Intergenic SNPs are likely to either affect transcription (mapping to a promoter or enhancer of a nearby gene) or map within a currently undescribed/unknown gene (eg, a novel lncRNA). One approach is to see if the SNP, or an LD partner SNP, has eQTL properties with a nearby gene. These data (from GenVar and UChicago eQTL tools) are mostly for protein-coding genes. If you're far from such genes, you may be in an enhancer and that's tougher to discern function and assign a relationship to a gene. There are enhancer tools and databases but showing the link to a gene controlled by that enhancer will be more difficult to obtain. Lastly, some non-coding RNAs might show sequence conservation across species but not always, and not at the precise position upstream/downstream of some anchor, like a protein-coding gene - which all means it could be difficult to say that your SNP maps to a gene encoding a novel non-coding RNA.

More to the point of your question: The Framingham Heart Study, which has initiated many GWAS, has used a distance to 60 kbp to assign a SNP to a gene. A distance greater than this is not assigned to a gene or done so but treated as "distant." You can always find the nearest gene, or one with an NM_# RefSeq mRNA (as opposed to a gene model mRNA), and the distance to that gene - carry forward both values.

modified 5.3 years ago • written 5.3 years ago by Larry_Parnell16k

That's a nice explanation, @Larry_Parnell ! My SNPs are mostly derived from GWAS, so think I'd go for the "nearest gene" criterium. However, I think is worth to give a look to the Genevar. Thanks!

written 5.3 years ago by Peixe530

Thanks you. You should use as many eQTL resources as you can find. Estimates range from 30% upwards that GWAS hits affect transcription and so eQTL analysis will be very helpful for you.

written 5.3 years ago by Larry_Parnell16k
