Question: Variant association in Whole Genome Sequencing data
0
gravatar for Sheila
4.6 years ago by
Sheila280
United States
Sheila280 wrote:

I have a large dataset of whole genome sequencing data. Recently in a large GWAS study, I learned a number of promising significant hits.  I would like to check to see if these these SNPs are associated (with the specific phenotypic trait i'm interested in) in the whole genome sequencing data that I have.  The data was genotyped by Illumina.  I have the .bam files and the .vcf files that they provided.  In these kinds of studies, what is the general work-flow that needs to be done in order to do this type of analysis?  Because I have the LD block of these SNPs, my thought was to extract these sections from the WGS data first using SAMTOOLS (or R).  Do I need to convert these into vcf files after?  And run an association analysis based on my phenotype of interest?  Thanks for your help, in advance.

sequencing snp next-gen • 1.8k views
ADD COMMENTlink modified 4.6 years ago by Katie D'Aco1000 • written 4.6 years ago by Sheila280
0
gravatar for Katie D'Aco
4.6 years ago by
Katie D'Aco1000
Massachusetts
Katie D'Aco1000 wrote:

I would run association tests using the whole genome vcf's (both common and rare variant tests, using something like plinkseq). From there, I would see if any hits are in your regions of interest from your chip based study.

Doing it the way you described above has advantages (shorter run time, less storage needed, smaller corrections for multiple tests), but what a shame it would be to ignore so much data! If you decide to go this route, you can just filter your existing vcf's for your regions of interest. No need to use the bam, and then convert into vcf unless you want to do your own variant calling.

ADD COMMENTlink written 4.6 years ago by Katie D'Aco1000

Hi Katie,

Thanks for your reply!  Could you explain more with what you mean by "Doing it the way you described above has advantages (shorter run time, less storage needed, smaller corrections for multiple tests), but what a shame it would be to ignore so much data!"  My thought that yes, it would be must faster, but I would also be removing any unrelated information.  Is this a niave approach?  What do you suggest would be a better method?

ADD REPLYlink written 3.7 years ago by Sheila280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1720 users visited in the last hour