Query regarding Intervals and interval lists (-L) for Target Enrichment Sequencing GATK HaplotypeCaller
0
0
Entering edit mode
2.1 years ago
ramshahaya ▴ 10

https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

Intervals and interval lists:

As it has been mentioned in the above link article (the interval list should correspond to the capture targets used for the library prep,)

Targeted sequencing (exomes, gene panels etc.)

For exomes and similarly targeted data types, the interval list should correspond to the capture targets used for the library prep, and is typically provided by the prep kit manufacturer (with versions for each ref genome build of course).

In my case, the target enrichment sequencing method was performed using Agilent SureSelect Target Enrichment System using the SureSelectXT Custom 3-5.9Mb. (Agilent) .

I had received Region.bed and Covered.bed file. Here I would like to request you to suggest to me which file I should use as the interval list? 

[design ID]_Regions.bed - This BED file contains a single track of the target regions of interest that SureDesign used to select the probes. You can use this track to see the exact regions that the program was attempting to cover when selecting the probes.

head -n 3 Region.bed

chr13    48069202    48084157    chr13:48069203-48084157

chr13    48110220    48120755    chr13:48110221-48120755

chr13    48123958    48166976    chr13:48123959-48166976

[design ID]_Covered.bed - This BED file contains a single track of the genomic regions that are covered by one or more probes in the design. The fourth column of the file contains annotation information. You can use this file for assessing coverage metrics.

head -n 3 Covered.bed

chr13    48069307    48069427    chr13:48069203-48084157

chr13    48069475    48069595    chr13:48069203-48084157

chr13    48070408    48070528    chr13:48069203-48084157

chr13    48070800    48070920    chr13:48069203-48084157

I would like to run the GATK HaplotypeCaller program using an interval list, As I had mentioned that I have target enrichment sequencing data.

Command:

--intervals / -L One or more genomic intervals over which to operate (Is it possible to use bed file (Covered.bed or Region.bed)as an interval (-L)?

gatk --java-options -Xmx50g HaplotypeCaller -R genome.fa -I SetNm.bam -O raw.g.vcf.gz -ERC GVCF   --minimum-mapping-quality 20    --min-base-quality-score 20 -L Covered.bed (Region.bed) -ip

Should I use one of these bed file as it is or should I create another bed file (chr"\t"start"\t"end) as a interval list? Should I keep the 1st 2nd 3rd column or should I keep the 4th column?

I would be grateful, kindly help me regarding this query.

Thank you so much in advance.

Agilent Interval_list HaplotypeCaller target_enrichment_list GATK • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6