Question: Tag SNPs - how to easily and effectively select them
gravatar for filippo.corponi
15 months ago by
filippo.corponi20 wrote:

I am working with genomic data using hg19 grch37. I need to select tag SNPs for a number of genes I need to study. I've annotated my data and extracted available SNPs mapping within my gene of interest. Once I've got my list of variants I can get tag SNPs in plink using:

plink --bfile mydata --show-tags mygene_variants.txt --list-all --tag-kb 5 --out mytags

I appreaciate there's a certain degree of overlapping: some SNPs are tagged by more than one SNP and some tagged SNPs are also tag SNPs. How can I easily select the best tag SNPs? Is there a software working with hg19 grch37, or an algorithm I can use in R programming languate?

gene snp tag snps genome • 1.2k views
ADD COMMENTlink modified 5 months ago by Biostar ♦♦ 20 • written 15 months ago by filippo.corponi20
gravatar for Kevin Blighe
15 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

Tag SNPs refers to a group of SNPs whose genotypes are predictive of other SNPs in their surrounding haploblocks. However, in some tagging experiments, one does not necessarily have to refer to 'haploblocks', and can instead just do a scan genome-wide for highly informative SNPs that define a particular group.

During my PhD, I developed a method for identifying haplotype tagging CNVs for the purposes of distinguishing the 4 populations from the 270 International HapMap Project, but this was before 1000 Genomes data was even released and before R packages became very popular. Whilst saying that, technically, in my tutorial here on Biostars, I am defining tag SNPS on the 1000 Genomes Phase III data, and these tag SNPs are highly informative of each respective population group: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

In the tutorial, the tagging SNP method that I use is based on linkage disequilibrium and the calculation of the variance inflation factor (see the section entitled 'Prune variants from each chromosome'), whereby tagging SNPs are identified in SNP bins across the entire genome. In fact, you'll find that most tagging SNP methods are based on linkage disequilibrium metrics in some shape or form.

I am not aware of many implementations in R for tag SNPs. As mentioned in this previous answer, HaploView would be a good standalone choice: A: Measure Tag Snps, R Package, Tools

You could easily do both the method that I used and also export your data into HaploView for further interrogation. Hopefully you are familiar with how you can load data into these programs (be aware that plink has an export function for HaploView format).


ADD COMMENTlink modified 15 months ago • written 15 months ago by Kevin Blighe48k

Thanks for you reply! I'm quite new to bioinformatics actually and am trying to familiarize with Haploview first. I have converted my binary (.bed, .bim, .fam) files to pedigree format (.ped, .map) via --recode in plink. I'am trying to upload the .ped file to Haploview Tagger. Any idea why I get this error?

Linux env: /bsub: No such file or directory

Job could not be submitted to the LSF queue!!!

Thanks again!

ADD REPLYlink written 15 months ago by filippo.corponi20

You'll need to post your full command.

ADD REPLYlink written 15 months ago by Devon Ryan91k

Yes, are you running this on a cluster environment?

ADD REPLYlink written 15 months ago by Kevin Blighe48k

I am running it from the tagger service available online at this link:

I have not downloaded haploview and I am trying to carry out the procedure online.

Once I select 'I want to upload my own genotype data as a PED file' I proceed to upload my .ped file clicking the button 'choose file' under the heading 'linkage format ("ped" file)'.


ADD REPLYlink written 15 months ago by filippo.corponi20

If using the Broad Institute's online service, you should contact them.

ADD REPLYlink written 15 months ago by Kevin Blighe48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1999 users visited in the last hour