Hi all,
I have genotype data obtained through an Infinium array. The genotype is either AA, AB, BB. I would like to find the SNP (i.e. A to C, T to G, etc.) at each site for each sample. I have annotation data for each SNP site giving the A and B alleles. Therefore, I'm thinking of converting this genotype data into SNP calls following these simple rules:
- If genotype is AA, don't report anything
- If genotype is AB or BB, report a SNP from A to B
My question is does this algorithm make sense, biologically?
Thanks.
Awesome.
The reason I want to do this is to produce a list of SNPs that I can compare to other SNPs obtained from WES and WGS (in vcf format).
I see. you'll probably find it easy to grep by "B" your Infinium array data, cut the chromosome and position columns, sort those positions and then filter your vcf files using that chr-pos list similarly to
grep B infiniumdata.txt | cut -f1,2 | sort -k1,1 -k2,2n | bcftools view -R - WESorWGSdata.vcf
, assuming chr and pos columns in your infinium data are 1 and 2 respectively. grepping vcf file instead of region filtering it through bcftools should also work since you're only interested in SNP positions.