Question: How to find On-Target and Off-Targer percentage of reads?
2
gravatar for bioinforesearchquestions
2.0 years ago by
United States
bioinforesearchquestions200 wrote:

Hi friends,

Recently, we performed exome sequencing for 3 samples using Nextera sequencing machine. We used new kit for exome sequencing. So I am interested in finding out the on-target and off-target percentage of reads from my exome sequencing run.

This is my understanding, On-target - Reads that are aligned to the regions that are targeted (exome regions as per the manifest file). Off-target - Reads that are aligned to the regions which are not targeted.

What should be the percentage of reads covering on-target region? and How can we calculate the on-target reads from BAM file? Which tool is useful in getting the percentage of reads covering on-target and off-target regions?

ADD COMMENTlink modified 2.0 years ago by dyollluap300 • written 2.0 years ago by bioinforesearchquestions200
2

Calculating Exome Coverage
Picard: https://broadinstitute.github.io/picard/picard-metric-definitions.html#HsMetrics

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax63k

Thanks, genomax2. I am currently working on the output files from picard.

ADD REPLYlink written 2.0 years ago by bioinforesearchquestions200

Hi Genomax2,

This is the output I got from picard hsmetrics. I am getting the on-target bases and on-bait bases (bait otherwise probes used for exon capturing). I used two kinds of bam file, a) just sorted bam and b) sorted, deduped, recalibrated bam. I believe that I should stick with the clean.dedup.recal.bam statistics.

1) What is vendor's filter?

2) I could see off_bait_bases, but not off-target_bases.

3) Will I be able to get the reads percentage for on-target and off-target?

The reason why I want reads on-target is,

A base within a read is considered on target if it is aligned with a targeted region. A read is considered on target if a single base within a read aligns to a targeted region. Measuring reads on target might be more accurate in representing the target fragments.

enter image description here

ADD REPLYlink written 24 months ago by bioinforesearchquestions200

Dear Friend, I have a sample whose read count is 51,578,482. After Duplicate removal, i have about 46,393,168 reads with me!! Mapped read count is 46,676,207(99.36%) & on Target read count is 38,221,722 (81.36). Do u think it is a good output? What should be the ideal on target mappability in term of %. I have used Agilent V6+UTR, 150 PE

Please give your valuable output!!

ADD REPLYlink written 23 months ago by alok.helix80
1
gravatar for aham
2.0 years ago by
aham40
aham40 wrote:

You can determine coverage using GATK's DepthOfCoverage walker. For 'on target' coverage, you can specify an interval list (bed file), for which GATK will calculate coverage.

java -Xms12g -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R hg19.fa -o out_depth_file -I input_deplicate_removed.bam -pt readgroup -ct 4 -ct 6 -ct 10 -L exome_capture_kit.bed

For detailed information on the parameters: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_coverage_DepthOfCoverage.php
For percentage of reads covering on-target region: Exome sequencing generates high quality data in non-target regions

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by aham40

Thanks, Mshakeel. I will read the links you provided.

ADD REPLYlink written 2.0 years ago by bioinforesearchquestions200
0
gravatar for dyollluap
2.0 years ago by
dyollluap300
USA, California, Bay Area
dyollluap300 wrote:

Picard tools can give those alignment stats as a stand alone tool - you just need the associated bed file for the exome capture kit specific to your protocol. It will generage a few output txt files, you want the HsMetrics and within that you will find the targeted coverage percentage.

ADD COMMENTlink written 2.0 years ago by dyollluap300

Thanks, Dyolluap. I used Picard tools HSmetrics.

ADD REPLYlink written 2.0 years ago by bioinforesearchquestions200

We used HSmetics too. But some large WXS files have extreme memory requirement. Sometimes we have to double memory allocation from 16G to 32G, or even 64G to get this done. We don't see similar problems with WGSmetrics no matter how big the file is. Does anyone know alternatives of HSmetrics?

ADD REPLYlink written 8 months ago by Zhenyu Zhang240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2056 users visited in the last hour