How To Determine The Low Read Coverage Regions In A Sam File
2
0
Entering edit mode
9.3 years ago
amirbenjab ▴ 10

Hi,

There are multiple FASTA files contain contigs. And also multiple SAM files contain the alignment of reads to the contigs.

Assuming the sam was converted to bam, I want to determine the low region coverage in the bam files.

For each bam file, should I compute the coverage of multiple regions or just one coverage region (i.e. mean(coverage(ranges)))?

thanks

contigs sequence read • 6.3k views
0
Entering edit mode

How are you defining low coverage? Is this transcriptomic or genomic coverage?

0
Entering edit mode

It's genomic coverage. All results are generated by BWA.

2
Entering edit mode
9.3 years ago

we (and I guess everyone that has been doing NGS data analysis) have faced this issue since the very NGS beginning. I find out our way of solving it very logical, so I guess most of you may have already implemented some of these solutions, but in case describing them could help let me tell you the 3 different ways in which we've addressed the issue:

• one would be a very descriptive one, which helps a lot to see how the experiment has performed on your genes of interest. it uses GATK's DepthOfCoverage walker, and it provides you with very convenient summary metrics for the regions of interest (genes in our particular case), including the percentage of which a particular region (or gene) has been sequenced above a certain coverage threshold (which of course can be customized).
• the next one would be also a coverage summary for all the samples sequenced, position per position. it goes through generating a bedgraph for each of your sam/bam files using bedtools' genomeCoverageBed, and it then uses bedtools' unionBedGraphs to join them all into a single massive table in which, for each position sequenced, you may check at once the coverage of all samples.
• finally, we also sometimes do sample by sample inspection, and it's very useful for us to highlight regions of coverage below a threshold. we have build a simple script that parses a previously generated bedgraph using again bedtools' genomeCoverageBed for a particular sam/bam file and it outputs a bed file with all the positions under the specified threshold grouped by regions into a IGV readable bed file. this allow you to browse you sam/bam file with a parallel track of low coverage regions that help your eye to quickly locate those kind of regions.
0
Entering edit mode
9.3 years ago

First you would need to come up with the definition of what you mean by low coverage (with respect of your problem set). Then you can use samtools mpileup to compute the coverage of each base or bedtools to compute genomic coverages

Some posts that cover this are:

You may also want to search for genomic coverage, Tip: you should also make use of the Similar posts link on each of these questions.