Question: Making a vcf file from a subset of regions from gvcf files
0
gravatar for Floris Brenk
2.5 years ago by
Floris Brenk900
USA
Floris Brenk900 wrote:

Dear all,

We have a large exome sequencing (>8000) cohort and recently processed them. Now at the end of the line there are some weird results for a few genes. So I would like to recreate some of the final vcf files (all samples combined). However doing this for all samples will take really long and requires high computational power. I was wondering would there be any objections or biases introduced by extracting just some genes of interest (like 50) from each gvcf and then continue with those subsetted gvcf files to speed everything up? Or does GATK steps require the whole gvcf present? Any other recommendations for in between steps are welcome :)

gatk vcf • 1.5k views
ADD COMMENTlink modified 2.5 years ago by aays140 • written 2.5 years ago by Floris Brenk900
1
gravatar for aays
2.5 years ago by
aays140
Canada
aays140 wrote:

I think the best way to go is to indeed feed in the entire gVCFs as input, but then specify specific intervals to include or exclude in your GenotypeGVCFs command. My understanding is that this will precompute what the actual desired intervals are and then only process those (I may be wrong about this, but I've noticed substantial differences in speed when doing it myself)

If you'd like to specify certain regions, create a flat file with the file extension .intervals and feed it to the -L argument in your GATK command. An .intervals file (let's say this is called myregions.intervals) looks like this:

chromosome_1
chromosome_2:1-100

The above would make GATK only process the entirety of chromosome_1 and positions 1-100 from chromosome_2. Further documentation can be found here. There is also an -XL argument if you'd like to exclude a specified set of intervals, if that's the easier way to go.

A sample command would look like this:

java -jar GenomeAnalysisTK.jar \
   -T GenotypeGVCFs \
   -R reference.fasta \
   -L myregions.intervals \
   --variant sample1.g.vcf \
   --variant sample2.g.vcf \
   -o output.vcf
ADD COMMENTlink written 2.5 years ago by aays140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour