How does one harmonize WGS data with Microarray data?
1
0
Entering edit mode
2.6 years ago
Moni • 0

Hello all,

I've been working with microarray data which, of course, explicitly reports the genotype at each SNP on the microarray chip

Now I am trying to incorporate WGS datasets into my project, and I would like to extract the genotypes at positions that match those in my microarray datasets. My WGS data is in the form of VCF files, which only explicitly report positions for which the sample data DIFFERS from the reference sequence (e.g. hg37).

My question is more about the positions that are NOT represented in the VCF file. Can I presume these are homologous to the reference? How do I know that these SNPs are not present in the VCF simply because they reside in low quality or unsequenced regions of the sample OR the reference? How does one reliably extract genotypes at SNP positions that are homologous to the reference?

Thanks for any advice.

WGS Extract from SNPs data • 1.4k views
ADD COMMENT
1
Entering edit mode

Hello,

You can not presume these are homologous to the reference, in fact, they may reside in low quality or unsequenced regions of the sample. Ideally, you should generate a gvcf file to reliably extract genotypes at SNP positions that are homologous to the reference. Otherwise, if you know the conditions (like coverage, uniformity, percentage of coverage at your target like 10x, filters that were applied)... you could assume these snps are homologous to the reference and list these conditions as a limitation of your study.

ADD REPLY
0
Entering edit mode

Dear desouzareis.r, Thank so much for your reply. This is extremely helpful, and definitely confirms my concerns about how to best interpret the WGS VCF files. I will look into the process of generating the GVCF files.

ADD REPLY
0
Entering edit mode
2.6 years ago
Randy H ▴ 110

What microarray chip data are you using? If one of the commercial, consumer sources you can use a tool WGS Extract to generate the microarray file results from the BAM file itself. If a more generic lab chip from Illumina or similar, you can make a request to add that as a target if you have the sample file.

But note: the tool is not correctly including InDel's that the microarray chips are calling. This because they have customized their methods and this has not been studied enough to replicate it for each different vendor. InDel's represent only about 5.000 entries of the 600,000+ often encountered.

ADD COMMENT
0
Entering edit mode

Dear Randy H., Thanks so much for your reply. The WGS Extract program looks great. I don't have the BAM files at present, but I'm going to try to go back and try to obtain these for analysis. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6