Dear Community,
based on a WES project of cancer bam files for variant calling purposes, I'm currently trying to collect some specific quality metrics-namely the on target coverage for each bam file. Based on a small search, I found than Picard is capable of this analysis, through the following function
https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics
In detail, based on the protocol library kit used for the experimental design [SureSelect Clinical Research Exome V2, Agilent Technologies], I used the following link:
https://earray.chem.agilent.com/suredesign/home.htm
and selected the same kit for downloading the necessary bed files- SureSelect Clinical Research Exome V2 (Design ID-S30409818). My main issue is the following:
As the function arguments are the following:
java -jar picard.jar CollectHsMetrics \
I=input_reads.bam \
O=output_hs_metrics.txt \
R=reference.fasta \
BAIT_INTERVALS=bait.interval_list \
TARGET_INTERVALS=target.interval_list
However, the files downloaded from agilent, included the following "name prefix" files:
_AllTracks.bed, _Covered.bed, _Padded.bed, _Regions.bed and a file named Targets.txt
Thus:
- Which of the above files should I use with the arguments
BAIT_INTERVALS
andTARGET_INTERVALS
, respectively? - Or alternatively, I have downloaded the wrong files, and I should search them in other repositories?
Thank you in advance and excuse me for any naive questions, but it is the first time that I'm trying to compute on target coverage !!
Best,
Efstathios
I'm not sure how typical this situation is, where the Covered and Regions files are exactly the same intervals. In case it is not the norm, I am supplementing the response from finswimmer with a couple old posts for reference.
I use this prior post for reference on what the different Agilent bed files contain: Question: Human Exome Capture Library Coordinates Download
I use this prior post for reference on what the different companies call their bed files: Question: Difference between primary and capture targets
In the case at hand, where the Covered and Regions files are the same, the Bait and Target interval files could be set to either one. If the files were different, you would use Covered for Bait and Regions for Target.
It would help to see the contents of some of these files?
Sure Kevin, i just made a dropbox link with the compressed file from the above Agilent link:
https://www.dropbox.com/s/8ulh1o8hcib0mms/S30409818_hs_hg38.zip?dl=0