Question: Which of the 4 SureSelect Agilent BED files to use with GATK haplotype caller?
gravatar for curious
9 months ago by
curious460 wrote:

I have some BAMs from whole exome sequencing.

I want to run GATK haplotype caller, which requires one bed file as input

SureSelect kit for the BAMs comes with 4 different .bed files:





Googling shows this question has been asked multiple times: What Agilent Interval Files (.Bed) Should I Use For Exome Variant Calling With Gatk?

I still don't know, but my gut instinct is to use the *_Padded.bed file because according to agilent it shows:

"the genomic regions that you can expect to sequence when using the design for target enrichment. To determine these regions, the program extends the regions in the Covered BED file by 100 bp on each side."

Has anyone done this before and know the way?

ADD COMMENTlink written 9 months ago by curious460

Just my 2 cents: I'm using the *_Padded file to subset my VCF file. Be aware that the regions can overlap.

ADD REPLYlink written 9 months ago by _r_am31k

I wonder if that even matters for my application, as far as I can tell the bed file is supplied as a argument to GATK Haplotype caller just to cut down on searching time by pointing to specific intervals. I hate making assumptions though i'll be on the lookout

ADD REPLYlink written 9 months ago by curious460

As an aside, are you sure a BED file would even work? I recall running into an issue a few years ago where GATK needed an interval_list file, which was similar but not identical to the BED format.

ADD REPLYlink written 9 months ago by _r_am31k

Well not anymore! I'll take a look thanks again. GATK is amazing resource, kind of complicated though.

ADD REPLYlink written 9 months ago by curious460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour