Question: Appropriate bed files from library capture kit for computing on target coverage of WES bam files with Picard and CollectHsMetrics
gravatar for svlachavas
23 months ago by
svlachavas680 wrote:

Dear Community,

based on a WES project of cancer bam files for variant calling purposes, I'm currently trying to collect some specific quality metrics-namely the on target coverage for each bam file. Based on a small search, I found than Picard is capable of this analysis, through the following function

In detail, based on the protocol library kit used for the experimental design [SureSelect Clinical Research Exome V2, Agilent Technologies], i used the following link:

and selected the same kit for downloading the necessary bed files- SureSelect Clinical Research Exome V2 (Design ID-S30409818). My main issue is the following:

As the function arguments are the following:

java -jar picard.jar CollectHsMetrics \
      I=input_reads.bam \
      O=output_hs_metrics.txt \
      R=reference.fasta \
      BAIT_INTERVALS=bait.interval_list \

However, the files downloaded from agilent, included the following "name prefix" files:

_AllTracks.bed, _Covered.bed, _Padded.bed, _Regions.bed and a file named Targets.txt


1) Which of the above files should i use with the arguments BAIT_INTERVALS and TARGET_INTERVALS, respectively ?

2) Or alternatively, i have downloaded the wrong files, and i should search them in other repositories ?

Thank you in advance and excuse me for any naive questions, but it is the first time that I'm trying to compute on target coverage !!



ADD COMMENTlink modified 23 months ago by finswimmer14k • written 23 months ago by svlachavas680

I'm not sure how typical this situation is, where the Covered and Regions files are exactly the same intervals. In case it is not the norm, I am supplementing the response from finswimmer with a couple old posts for reference.

I use this prior post for reference on what the different Agilent bed files contain: Question: Human Exome Capture Library Coordinates Download

I use this prior post for reference on what the different companies call their bed files: Question: Difference between primary and capture targets

In the case at hand, where the Covered and Regions files are the same, the Bait and Target interval files could be set to either one. If the files were different, you would use Covered for Bait and Regions for Target.

ADD REPLYlink modified 14 months ago • written 14 months ago by c_dampier60

It would help to see the contents of some of these files?

ADD REPLYlink written 23 months ago by Kevin Blighe67k

Sure Kevin, i just made a dropbox link with the compressed file from the above Agilent link:

ADD REPLYlink written 23 months ago by svlachavas680
gravatar for finswimmer
23 months ago by
finswimmer14k wrote:


the differences between the files is described in the header of each file. If I remember correctly the _Padded.bed is the same as _Regions.bed but have additional bases (20?) to the left and right of each interval. Decide yourself if you need this.

You should take the same bed file for the BAIT_INTERVALS and TARGET_INTERVALS parameter.

fin swimmer

ADD COMMENTlink written 23 months ago by finswimmer14k

Dear Fin,

thank you for your comments-i have checked each file and the description in each-however, if the all_Tracks.bed is the same with the covered.bed, and the regions.bed is the same with the covered.bed, why there are created as different files ? and in the end, which specific file in your opinion should i use both in the bait and target intervals ? that is the covered.bed file ? (=Genomic regions covered by probes) ?

Thank you in advance,


ADD REPLYlink written 23 months ago by svlachavas680

If I'm honest, I've never understood what Agilent is doing here. I also take a look again on the files you've linked to.

  • covered.bed and regions.bed are exactly the same
  • padded.bed extended the regions by 100 bases on each site
  • For all_Track.bed I can just guess. I guess it contains the exon regions for all genes covered by this panel. But the panel itself will only cover the known coding regions.

You have to decide if in your analyses you are only interested in exonic regions or also the neighboring intronic regions. For the first one use covered.bed, for the later padded.bed.

I'd prefer using the covered.bed as basis and adjust the padding to 20 :)

fin swimmer

ADD REPLYlink written 23 months ago by finswimmer14k

Hello Sir, I was also facing same issue with the selection of files for target coverage analysis and your explanation really helped.Thanks a lot.

ADD REPLYlink modified 5 months ago • written 5 months ago by supriya.awasthy10

Thanks a lot Fin for the explanations !! Really appreciated it !! I will go for the exonic regions, as they are of main interest.


ADD REPLYlink written 23 months ago by svlachavas680
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2150 users visited in the last hour