Question

Variant association in Whole Genome Sequencing data

0

Entering edit mode

9.7 years ago

Sheila ▴ 420

I have a large dataset of whole genome sequencing data. Recently in a large GWAS study, I learned a number of promising significant hits. I would like to check to see if these these SNPs are associated (with the specific phenotypic trait I'm interested in) in the whole genome sequencing data that I have. The data was genotyped by Illumina. I have the .bam files and the .vcf files that they provided. In these kinds of studies, what is the general work-flow that needs to be done in order to do this type of analysis? Because I have the LD block of these SNPs, my thought was to extract these sections from the WGS data first using SAMTOOLS (or R). Do I need to convert these into vcf files after? And run an association analysis based on my phenotype of interest? Thanks for your help, in advance.

sequencing SNP next-gen • 2.7k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 9.7 years ago by Sheila ▴ 420

Ram · Answer 1 · 2014-07-28

0

Entering edit mode

9.7 years ago

Katie D'Aco ★ 1.1k

I would run association tests using the whole genome vcf's (both common and rare variant tests, using something like plinkseq). From there, I would see if any hits are in your regions of interest from your chip based study.

Doing it the way you described above has advantages (shorter run time, less storage needed, smaller corrections for multiple tests), but what a shame it would be to ignore so much data! If you decide to go this route, you can just filter your existing vcf's for your regions of interest. No need to use the bam, and then convert into vcf unless you want to do your own variant calling.

ADD COMMENT • link 9.7 years ago by Katie D'Aco ★ 1.1k

0

Entering edit mode

Hi Katie,

Thanks for your reply! Could you explain more with what you mean by "Doing it the way you described above has advantages (shorter run time, less storage needed, smaller corrections for multiple tests), but what a shame it would be to ignore so much data!" My thought that yes, it would be must faster, but I would also be removing any unrelated information. Is this a naive approach? What do you suggest would be a better method?

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by Sheila ▴ 420