Converting genotype data to SNP calls
1
0
Entering edit mode
7.5 years ago
novice ★ 1.1k

Hi all,

I have genotype data obtained through an Infinium array. The genotype is either AA, AB, BB. I would like to find the SNP (i.e. A to C, T to G, etc.) at each site for each sample. I have annotation data for each SNP site giving the A and B alleles. Therefore, I'm thinking of converting this genotype data into SNP calls following these simple rules:

  • If genotype is AA, don't report anything
  • If genotype is AB or BB, report a SNP from A to B

My question is does this algorithm make sense, biologically?

Thanks.

SNP illumina genotype • 2.9k views
ADD COMMENT
2
Entering edit mode
7.5 years ago

I don't exactly see why would you want to do so, but "biologically speaking" your algorithm "makes sense" if you are willing to create just a catalog of variant sites from your original data.

  • AA is an homozygous reference genotype, so you could consider it as a non variant site. you could therefore avoid writing A to A or A>A, since the reference allele is the only one present on both copies of that diploid organism you sure are working with.
  • both AB and BB imply a change from the reference allele, heterozygous and homozygous respectively and they could both be written as A to B or A>B, but you would lose the zygosity information. you'll have to be sure that the downstream analyses you'll perform on this new data don't need or benefit from that information.
ADD COMMENT
0
Entering edit mode

Awesome.

The reason I want to do this is to produce a list of SNPs that I can compare to other SNPs obtained from WES and WGS (in vcf format).

ADD REPLY
0
Entering edit mode

I see. you'll probably find it easy to grep by "B" your Infinium array data, cut the chromosome and position columns, sort those positions and then filter your vcf files using that chr-pos list similarly to grep B infiniumdata.txt | cut -f1,2 | sort -k1,1 -k2,2n | bcftools view -R - WESorWGSdata.vcf, assuming chr and pos columns in your infinium data are 1 and 2 respectively. grepping vcf file instead of region filtering it through bcftools should also work since you're only interested in SNP positions.

ADD REPLY

Login before adding your answer.

Traffic: 2379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6