https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
Intervals and interval lists:
As it has been mentioned in the above link article (the interval list should correspond to the capture targets used for the library prep,)
Targeted sequencing (exomes, gene panels etc.)
For exomes and similarly targeted data types, the interval list should correspond to the capture targets used for the library prep, and is typically provided by the prep kit manufacturer (with versions for each ref genome build of course).
In my case, the target enrichment sequencing method was performed using Agilent SureSelect Target Enrichment System using the SureSelectXT Custom 3-5.9Mb. (Agilent) .
I had received Region.bed and Covered.bed file. Here I would like to request you to suggest to me which file I should use as the interval list?
[design ID]_Regions.bed - This BED file contains a single track of the target regions of interest that SureDesign used to select the probes. You can use this track to see the exact regions that the program was attempting to cover when selecting the probes.
head -n 3 Region.bed
chr13 48069202 48084157 chr13:48069203-48084157
chr13 48110220 48120755 chr13:48110221-48120755
chr13 48123958 48166976 chr13:48123959-48166976
[design ID]_Covered.bed - This BED file contains a single track of the genomic regions that are covered by one or more probes in the design. The fourth column of the file contains annotation information. You can use this file for assessing coverage metrics.
head -n 3 Covered.bed
chr13 48069307 48069427 chr13:48069203-48084157
chr13 48069475 48069595 chr13:48069203-48084157
chr13 48070408 48070528 chr13:48069203-48084157
chr13 48070800 48070920 chr13:48069203-48084157
I would like to run the GATK HaplotypeCaller program using an interval list, As I had mentioned that I have target enrichment sequencing data.
Command:
--intervals / -L One or more genomic intervals over which to operate (Is it possible to use bed file (Covered.bed or Region.bed)as an interval (-L)?
gatk --java-options -Xmx50g HaplotypeCaller -R genome.fa -I SetNm.bam -O raw.g.vcf.gz -ERC GVCF --minimum-mapping-quality 20 --min-base-quality-score 20 -L Covered.bed (Region.bed) -ip
Should I use one of these bed file as it is or should I create another bed file (chr"\t"start"\t"end) as a interval list? Should I keep the 1st 2nd 3rd column or should I keep the 4th column?
I would be grateful, kindly help me regarding this query.
Thank you so much in advance.