Coverage analysis in a targeted amplicon-based next-generation sequencing panel
3
0
Entering edit mode
4.5 years ago
armarkes ▴ 10

Hi folks,

Can anyone help me with an issue please?

I performed NGS sequencing of a custom panel (that I designed) from illumina (Truseq). Could you tell me which tools can I use to calculate the coverage in my desired regions? I tried GATK and bcftools but I can not do what I wanted.

Imagine I have 100 amplicons that I amplified for 95 samples, I would like to know how many reads there are per sample and per amplicon. Something like this:

                  sample1    sample2   sample3
amplicon1         100           45       20
amplicon2           0           10       5
amplicon2          80           57       90


I need to understand how the sequencing worked in my panel, if I have coverage in all amplicons or not.

Ana

NGS illumina Truseq coverage • 3.5k views
0
Entering edit mode

Have you fix the problem? I have a suggestion. You can turn the bam file to bed file . The pair read's start and end will match the amplicon's start and end. So you can just count the reads whose start and end match the amplicon

2
Entering edit mode
4.5 years ago
Jokhe ▴ 120

Bedtools is a good option indeed. I can also recommend Picard Tool's CalculateHsMetrics tool which is very useful when analyzing the success of targeted NGS sequencing. It doesn't give the result you are exactly asking for but it reports the percentage of bases covered less than nX times in targeted sequencing. In other words it gives you an answer for question "How many percent of my bases is covered at least 100 times?" and so on. With an appropriate BED file it should be possible to evaluate the success of individual amplicons.

2
Entering edit mode
4.5 years ago
Zaag ▴ 800

GATK's depth of coverage does exactly what you want. Give it all the BAM files and a bed file with the amplicon regions and it will give the coverage per region per patient, per base per patient and some more.

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R ref.fa \
-I file1.bam \
-I file2.bam \
--countType COUNT_FRAGMENTS \
-o output \
-L target.bed


And the per target output will look like this (in table form):

| Target | total_coverage | average_coverage | file1_total_cvg | file1_mean_cvg | file1_granular_Q1 | file1_granular_median | file1_granular_Q3 | file1_%_above_15 | file2_total_cvg | file2_mean_cvg | file2_granular_Q1 | file2_granular_median | file2_granular_Q3 | file2_%_above_15 | |:--------------:|----------------|------------------|-----------------|----------------|-------------------|-----------------------|-------------------|------------------|-----------------|----------------|-------------------|-----------------------|-------------------|------------------| | chr1:1000-2000 | 1286996 | 10054.66 | 40746 | 318.33 | 328 | 328 | 328 | 100 | 43366 | 338.8 | 342 | 342 | 357 | 100 | | chr1:4000-4500 | 1907320 | 10422.51 | 56206 | 307.14 | 314 | 314 | 328 | 100 | 60442 | 330.28 | 314 | 342 | 357 | 100 |

0
Entering edit mode

Thanks a lot for your help. I already tried this option but I got the coverage per each base instead of the coverage per amplicon. I created a bed file with coordinates for each amplicon, but the tool calculates coverage per site, instead of coverage per region. Maybe I am missing some option, but I do not know what it is. Do you know how to handle it?

0
Entering edit mode

I get a per base file and:

• output.sample_interval_statistics
• output.sample_interval_summary
• output.sample_statistics
• output.sample_summary

and the per target coverage information is in the file with the sample_interval_summary extension.

Could you show the command you used?

1
Entering edit mode
4.5 years ago

I think bedtools coverage should be a (partial) solution: http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html

Traffic: 2346 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.