Why depth has discrepancy after bam file is split to forward and reverse ones with samtools,
0
0
Entering edit mode
9.7 years ago
xxq.t.xu • 0

I have a bam file, say x.bam, with it I split into forward and reverse halves:

samtools view -F 16 x.bam -b -o x.for.bam #(command a)
samtools view -f 16 x.bam -b -o x.rev.bam #(command b)

I then check depth at certain positions, for example:

samtools depth -b my.bed x.bam | grep 12345678 #(command 1)
samtools depth -b my.bed x.for.bam | grep 12345678 #(command 2)
samtools depth -b my.bed x.rev.bam | grep 12345678 #(command 3)

For some/most positions, the number from command 2 and command 3 add up to the number of command 1; however in some positions, I got results like this:

  • command 1 gets 62
  • command 2 gets 2
  • command 3 gets 8010

apparently 8010+2 != 62, and how come reverse only bam has more depth (8010 reads) at that position than the overall bam file (62 reads)? This does not make sense to me. Did I miss anything here?

TIA

ps:

I also checked map qual for those 8010 reads from reverse bam:

samtools depth -b my.bed -q 30 x.rev.bam | grep 12345678, which yields 7876 reads, indicating most of the 8010 reads have good map quality.

ps2:

I opened IGV and looked the region. Funny thing is that x.bam file does not have reads in that position, while x.rev.bam has thousands of reads at that position. Have no idea why x.rev.bam gets those reads, while x.bam does not have it.

samtools direction bam split depth • 3.0k views
ADD COMMENT
0
Entering edit mode

You can query on depth directly like so:

samtools depth data.bam -r chr:1-100 

once you find a region that seems wrong investigate more with samtools view over that range. The two counts should add up.

I vaguely recall a situation where samtools count and samtools depth gave radically different answers over a region on the account of including (or not) the unmapped reads.

ADD REPLY
0
Entering edit mode

Thank you Istvan Albert

depth with region command you gave generated the same discrepancy I saw.

I opened IGV and looked the region. Funny thing is that x.bam file does not have reads in that position, while x.rev.bam has thousands of reads at that position. Have no idea why x.rev.bam gets those reads, while x.bam does not have it.

ADD REPLY
1
Entering edit mode

Well you can evaluate it directly on the original file

samtools view -c data.bam chr1:1-100
samtools view -c -F 16 data.bam  chr1:1-100
samtools view -c -f 16 data.bam chr1:1-100

this will return the number of sequences that have coordinates in a region but will include the unmapped reads (Some aligners may fill in the chromosomal coordinate yet still consider the read unmapped!).

Edit: These latter two should match with depth:

# mapped reads on forward strand
samtools view -c -F 20 data.bam  chr1:1-100
# mapped reads on reverse strand
samtools view -c -F 4 -f 16 data.bam chr1:1-100
ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6