Question: Appropriate bed files from library capture kit for computing on target coverage of WES bam files with Picard and CollectHsMetrics
1
gravatar for svlachavas
4 months ago by
svlachavas560
Greece
svlachavas560 wrote:

Dear Community,

based on a WES project of cancer bam files for variant calling purposes, I'm currently trying to collect some specific quality metrics-namely the on target coverage for each bam file. Based on a small search, I found than Picard is capable of this analysis, through the following function

https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics

In detail, based on the protocol library kit used for the experimental design [SureSelect Clinical Research Exome V2, Agilent Technologies], i used the following link:

https://earray.chem.agilent.com/suredesign/home.htm

and selected the same kit for downloading the necessary bed files- SureSelect Clinical Research Exome V2 (Design ID-S30409818). My main issue is the following:

As the function arguments are the following:

java -jar picard.jar CollectHsMetrics \
      I=input_reads.bam \
      O=output_hs_metrics.txt \
      R=reference.fasta \
      BAIT_INTERVALS=bait.interval_list \
      TARGET_INTERVALS=target.interval_list

However, the files downloaded from agilent, included the following "name prefix" files:

_AllTracks.bed, _Covered.bed, _Padded.bed, _Regions.bed and a file named Targets.txt

Thus:

1) Which of the above files should i use with the arguments BAIT_INTERVALS and TARGET_INTERVALS, respectively ?

2) Or alternatively, i have downloaded the wrong files, and i should search them in other repositories ?

Thank you in advance and excuse me for any naive questions, but it is the first time that I'm trying to compute on target coverage !!

Best,

Efstathios

ADD COMMENTlink modified 4 months ago by finswimmer11k • written 4 months ago by svlachavas560

It would help to see the contents of some of these files?

ADD REPLYlink written 4 months ago by Kevin Blighe41k

Sure Kevin, i just made a dropbox link with the compressed file from the above Agilent link:

https://www.dropbox.com/s/8ulh1o8hcib0mms/S30409818_hs_hg38.zip?dl=0

ADD REPLYlink written 4 months ago by svlachavas560
2
gravatar for finswimmer
4 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello,

the differences between the files is described in the header of each file. If I remember correctly the _Padded.bed is the same as _Regions.bed but have additional bases (20?) to the left and right of each interval. Decide yourself if you need this.

You should take the same bed file for the BAIT_INTERVALS and TARGET_INTERVALS parameter.

fin swimmer

ADD COMMENTlink written 4 months ago by finswimmer11k

Dear Fin,

thank you for your comments-i have checked each file and the description in each-however, if the all_Tracks.bed is the same with the covered.bed, and the regions.bed is the same with the covered.bed, why there are created as different files ? and in the end, which specific file in your opinion should i use both in the bait and target intervals ? that is the covered.bed file ? (=Genomic regions covered by probes) ?

Thank you in advance,

Efstathios

ADD REPLYlink written 4 months ago by svlachavas560
1

If I'm honest, I've never understood what Agilent is doing here. I also take a look again on the files you've linked to.

  • covered.bed and regions.bed are exactly the same
  • padded.bed extended the regions by 100 bases on each site
  • For all_Track.bed I can just guess. I guess it contains the exon regions for all genes covered by this panel. But the panel itself will only cover the known coding regions.

You have to decide if in your analyses you are only interested in exonic regions or also the neighboring intronic regions. For the first one use covered.bed, for the later padded.bed.

I'd prefer using the covered.bed as basis and adjust the padding to 20 :)

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer11k

Thanks a lot Fin for the explanations !! Really appreciated it !! I will go for the exonic regions, as they are of main interest.

Efstathios

ADD REPLYlink written 4 months ago by svlachavas560
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1889 users visited in the last hour