bamCoverage fails in bam files with large number of small contigs in headers
1
0
Entering edit mode
8 months ago
sarahmanderni ▴ 100

Hi,

I plan to use bamCoverage from Deeptools to get bw files, but it looks like the thread is dead as it never finishes (no errors). I have hundred of thousands of short unlocalized/random contigs in bam files and looks like this can be the reason bamCoverage fails (https://github.com/deeptools/deepTools/issues/662). Using samtools, I removed all the reads except those aligned to chromosomes 1:22 (samtools view -o out.bam in.bamseq 1 22 | sed 's/^/chr/') but of course the tags related to the removed contigs are still available at bam headers and so the problem with bamCoverage persists. Is it reasonable to just naively modify the bam headers (eg samtools reheader) so that only headers related to main chromosomes are retained? I tried samtools reheader newheader.txt out.bam > newbam.bam but the resulting files looks corrupted.

bamCoverage • 771 views
ADD COMMENT
0
Entering edit mode
8 months ago

Are you sure the newheader.txt is in sam format ? I re-headered bams successfully using this. Maybe you messed up the sam header somehow with sed etc ?

samtools view -H old_bam.bam > header.sam
samtools reheader header.sam old_bam.bam > new.bam
ADD COMMENT
0
Entering edit mode

Probably thats the case. What x.bam refers to in your example? Here is what I did: Retained only reads aligned to chromosomes 1:22:

samtools view -o out.bam in.bam seq 1 22 | sed 's/^/chr/'

Took the header:

samtools view -H out.bam > header.sam

The header.sam still has the '@'SQ line for all the removed contigs. So, manually copy pasted the '@'SQ lines related to chr 1:22 and '@'PQ lines (newheader.sam) and used it for samtools reheader:

samtools reheader newheader.sam outbam.bam >  newbam.bam

Based on your response, the manual modification of header file could be reason. In your example, how x.bam doesnt include the headers from the removed contigs?

ADD REPLY
0
Entering edit mode

Your approach looks fine to me. Maybe just paste the before and after headers (without all the @SQ lines to see if someone sees a mistake. The x.bam is the old_bam.bam, I renamed it.

How do you know the new bam is corrupt ? Just run samtools index on it to see.

ADD REPLY
0
Entering edit mode

Exactly! I know its corrupted as I cant run samtools index:

samtools index newbam.bam
samtools index: failed to create index for "newbam.bam": Numerical result out of range
ADD REPLY
0
Entering edit mode

This link might be helpful ...

https://github.com/samtools/samtools/issues/1118

ADD REPLY

Login before adding your answer.

Traffic: 2390 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6