Truncate unevenly covered BAM file at certain coverage
2
0
Entering edit mode
5 months ago
predeus ★ 1.7k

Hello all,

This is basic enough question, but I can't seem to find the answer online.

I have a BAM from RNA-seq alignment which is very unevenly covered. However the mean coverage is huge (~100,000x - it's a mitochondrion). I want to use it for polishing, so I would like to retain about 100x "sliding window" (or simply at any given nucleotide) coverage across the whole chromosome, and lose the rest.

How can I achieve this?

Thank you in advance, as always

UPD: Apparently, there's a discussion here that directly answers my question.

samtools bam • 462 views
1
Entering edit mode
5 months ago

use the option -s of samtools view

1
Entering edit mode

Nice, my current version of samtools (1.7) doesn't even have that yet! Thank you very much.

0
Entering edit mode

All right, I've read about the option and still can't figure out how to use the "filter expressions" for coverage-based filtering. I might be missing something obvious...

0
Entering edit mode

my bad. I meant option -s not option -e. (fixed)

  -s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the
fraction of templates/read pairs to keep; INT part sets seed)

0
Entering edit mode

I see. I knew about -s but this is not what I'm after - I don't want to proportionally downsample the reads, I only want to remove reads in the areas where coverage is > 100, for example.

1
Entering edit mode
0
Entering edit mode

Brilliant, I didn't find that thread somehow (it's tricky to select the right keywords). Thank you!

Brian's explanation about normalization via k-mers also makes a lot of sense - it has long puzzled me what Trinity was doing for example.

1
Entering edit mode
5 months ago
QLFblaireau ▴ 30

You could use samtools depth

for example

samtools depth -a my.bam |awk '\$3 >=100' > file.coverage


Then make a bedfile out of the file.coverage and use it in combination with samtools view to only extract position where there is the expected coverage.

Or easier, you could use mosdepth that will directly output a bed file that you can filter.

mosdepth

0
Entering edit mode

Thank you! However, I don't want to find regions with some type of coverage - I want to remove the excessive coverage, retaining reads up to a certain maximum, e.g. 100.

This approach also can work (together with samtools view -s), but it becomes dreadfully convoluted - you need to

• split bam into windows;
• downsample each file according to mean coverage in each window;
• put them back together.

Seems like there should be a tool that's doing it...