Question

Finding tag SNPs

0

Entering edit mode

3.9 years ago

kylec1729 ▴ 10

What is the best way to find tag SNPs associated with particular candidate genes? For example, if I wanted to look at the effects of IFITM3 on influenza patients using their genotype + hospital data, what is the optimal way to find SNPs? I used the UCSC genome browser to find a few randomly chosen SNPs, but I was wondering if there is a better (i.e. to get more accurate results) way to do this in the end? I'm looking specifically for SNPs that give a clear and accurate picture of the gene of interest.

SNP snp tag • 1.2k views

ADD COMMENT • link 3.9 years ago by kylec1729 ▴ 10

score 0 · Answer 1 · 2020-05-13

0

Entering edit mode

3.9 years ago

Kevin Blighe 87k

For example, if I wanted to look at the effects of IFITM3 on influenza patients using their genotype + hospital data, what is the optimal way to find SNPs?

Is this not essentially an eQTL analysis, adjusting for clinical parameters?

ADD COMMENT • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

I'm not sure. I'm looking for just a group of SNPs that just gives a good representation of said gene. eQTL might be useful in terms of seeing what affects the ultimate expression of the protein the gene codes for (and identifying SNPs that are crucial for the expression), but does that necessarily give a representative picture of the gene itself? What I've done so far is just go to SNPedia, search for the gene (for example, IFITM3) and took all SNPs that SNPedia has deemed relevant to that gene. Is this not the best way to do this?

The basic thing I'm trying to do is this: I'm comparing 3 diseases, and I would like to see if there is any difference in genetic variants in the candidate genes that I've chosen for these diseases based on pathophysiology. I don't have enough data for a GWAS, so I'm going with a candidate gene based approach. That's my reason for wanting these "representative" SNPs from each gene.

I've already used PLINK to extract those SNPedia SNPs from my genomic data, but I'm still wondering if there is a better way of making sure that I can optimize my choice of SNPs for this purpose.

ADD REPLY • link 3.9 years ago by kylec1729 ▴ 10

1

Entering edit mode

Oh, I see what you mean; so, this has nothing to do with gene expression and, therefore, nothing to do with eQTL. I am not familiar with SNPedia; so, I do not know how it has pre-filtered its list of SNPs covering each gene.

I would pull all SNPs covering my genes of interest (+/- 50 kilobase [or some other range]) that are listed in 1000 Genomes Phase III data, and then export that data from PLINK to HaploView format, where I could define haplo-blocks and -tyes, and also identify tag SNPs. I touch on these topics:

If you get that far and have a few candidate SNPs, you can do some simple calculations by following this:

A: SNP dataset and Z Score

Different ways of approaching it.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Hey Kevin - thanks a lot for the reply. I've been working with your suggestions. I was wondering - how do I decide an appropriate Hardy-Weinberg p value (HWpval) cutoff in HaploView for the purposes described above? There seems to be some desirable SNPs that were left out in the tagging process (the built in "tagger" that HaploView has) because the HWpval's are so small. Is a small HWpval a bad thing for this purpose?

ADD REPLY • link 3.9 years ago by kylec1729 ▴ 10

0

Entering edit mode

I think that HWE p<0.001 is used frequently. If your cohort is a disease cohort, though, then HWE filtering can actually eliminate key disease-associated variants. In HaploView / Tagger, can you not select SNPs to retain?

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

It is a disease cohort (infectious diseases). I can select SNPs to retain, but I'm not exactly sure what the important things are to look at in selecting these SNPs. For example, 1) do I want a low or high HWpval? 2) Do I want a MAF > 1% (the original is set at 0.1%, which seems a bit low)? The program set r^2 to be greater than or equal to 0.8, which I'm assuming is optimal. 3) With respect to the LD plot, which haploblock algorithm should I use? There are three default selections (confidence interval, four gamete, and solid spine) and you can also customize the blocks yourself (so, 4) should I just be making them myself with my best judgment, as you spoke about earlier?). I apologize for such general questions, I've never used HaploView before (but it seems like a great tool!).

ADD REPLY • link 3.9 years ago by kylec1729 ▴ 10

0

Entering edit mode

The HWE filtering on HaploView seems a bit odd. For example, there's a SNP that I think would be useful as a tag (which I got from SNPedia and has been implicated with the gene clinically) rs233575 which has a HWpval of around 10^(-14), so very small. The HWpval cutoff is (by default) set to 0.0010, so any p value less than that is not taken into consideration, which seems odd - I'm thinking that it should be the opposite, i.e. cut off any p value greater than that cutoff. Don't we want small HW p values, so that the probability that its deviation from HWE could be explained by chance is very small?

ADD REPLY • link 3.9 years ago by kylec1729 ▴ 10