Tag SNPs - how to easily and effectively select them
1
0
Entering edit mode
5.9 years ago
F_cm_C ▴ 30

I am working with genomic data using hg19 grch37. I need to select tag SNPs for a number of genes I need to study. I've annotated my data and extracted available SNPs mapping within my gene of interest. Once I've got my list of variants I can get tag SNPs in plink using:

plink --bfile mydata --show-tags mygene_variants.txt --list-all --tag-kb 5 --out mytags

I appreaciate there's a certain degree of overlapping: some SNPs are tagged by more than one SNP and some tagged SNPs are also tag SNPs. How can I easily select the best tag SNPs? Is there a software working with hg19 grch37, or an algorithm I can use in R programming languate?

tag SNPs genome gene SNP • 3.7k views
ADD COMMENT
3
Entering edit mode
5.9 years ago

Tag SNPs refers to a group of SNPs whose genotypes are predictive of other SNPs in their surrounding haploblocks. However, in some tagging experiments, one does not necessarily have to refer to 'haploblocks', and can instead just do a scan genome-wide for highly informative SNPs that define a particular group.

During my PhD, as a side project, I developed a method for identifying haplotype tagging CNVs for the purposes of distinguishing the 4 populations from the 270 International HapMap Project, but this was before 1000 Genomes data was even released and before R packages became very popular. Whilst saying that, technically, in my tutorial here on Biostars, I am defining tag SNPS on the 1000 Genomes Phase III data, and these tag SNPs are highly informative of each respective population group: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)

In the tutorial, the tagging SNP method that I use is based on linkage disequilibrium and the calculation of the variance inflation factor (see the section entitled 'Prune variants from each chromosome'), whereby tagging SNPs are identified in SNP bins across the entire genome. In fact, you'll find that most tagging SNP methods are based on linkage disequilibrium metrics in some shape or form.

I am not aware of many implementations in R for tag SNPs. As mentioned in this previous answer, HaploView would be a good standalone choice: A: Measure Tag Snps, R Package, Tools

You could easily do both the method that I used and also export your data into HaploView for further interrogation. Hopefully you are familiar with how you can load data into these programs (be aware that plink has an export function for HaploView format).

Kevin

ADD COMMENT
0
Entering edit mode

Thanks for you reply! I'm quite new to bioinformatics actually and am trying to familiarize with Haploview first. I have converted my binary (.bed, .bim, .fam) files to pedigree format (.ped, .map) via --recode in plink. I'am trying to upload the .ped file to Haploview Tagger. Any idea why I get this error?

Linux env: /bsub: No such file or directory

Job could not be submitted to the LSF queue!!!

Thanks again!

ADD REPLY
0
Entering edit mode

You'll need to post your full command.

ADD REPLY
0
Entering edit mode

Yes, are you running this on a cluster environment?

ADD REPLY
0
Entering edit mode

I am running it from the tagger service available online at this link:

http://archive.broadinstitute.org/mpg/tagger/server.html

I have not downloaded haploview and I am trying to carry out the procedure online.

Once I select 'I want to upload my own genotype data as a PED file' I proceed to upload my .ped file clicking the button 'choose file' under the heading 'linkage format ("ped" file)'.

Thanks

ADD REPLY
0
Entering edit mode

If using the Broad Institute's online service, you should contact them.

ADD REPLY

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6