Question: Converting genotype data to SNP calls
gravatar for novice
4.2 years ago by
United States
novice990 wrote:

Hi all,

I have genotype data obtained through an Infinium array. The genotype is either AA, AB, BB. I would like to find the SNP (i.e. A to C, T to G, etc.) at each site for each sample. I have annotation data for each SNP site giving the A and B alleles. Therefore, I'm thinking of converting this genotype data into SNP calls following these simple rules:

  • If genotype is AA, don't report anything
  • If genotype is AB or BB, report a SNP from A to B

My question is does this algorithm make sense, biologically?


snp genotype illumina • 2.1k views
ADD COMMENTlink modified 4.2 years ago by Jorge Amigo12k • written 4.2 years ago by novice990
gravatar for Jorge Amigo
4.2 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

I don't exactly see why would you want to do so, but "biologically speaking" your algorithm "makes sense" if you are willing to create just a catalog of variant sites from your original data.

  • AA is an homozygous reference genotype, so you could consider it as a non variant site. you could therefore avoid writing A to A or A>A, since the reference allele is the only one present on both copies of that diploid organism you sure are working with.
  • both AB and BB imply a change from the reference allele, heterozygous and homozygous respectively and they could both be written as A to B or A>B, but you would lose the zygosity information. you'll have to be sure that the downstream analyses you'll perform on this new data don't need or benefit from that information.
ADD COMMENTlink written 4.2 years ago by Jorge Amigo12k


The reason I want to do this is to produce a list of SNPs that I can compare to other SNPs obtained from WES and WGS (in vcf format).

ADD REPLYlink written 4.2 years ago by novice990

I see. you'll probably find it easy to grep by "B" your Infinium array data, cut the chromosome and position columns, sort those positions and then filter your vcf files using that chr-pos list similarly to grep B infiniumdata.txt | cut -f1,2 | sort -k1,1 -k2,2n | bcftools view -R - WESorWGSdata.vcf, assuming chr and pos columns in your infinium data are 1 and 2 respectively. grepping vcf file instead of region filtering it through bcftools should also work since you're only interested in SNP positions.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Jorge Amigo12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2029 users visited in the last hour