Question: Using Samtools depth to get number of forward and reverse reads per base from a .bam file
1
gravatar for jspainhour3
18 months ago by
jspainhour310
jspainhour310 wrote:

I am trying to build a tab delimited table of count and read data from a deduplicated realigned bam file. I am looking at mitochondrial DNA (ver NC_012920.1) and want to know the read depth or count per base and the break up of that depth by read direction and the read base makeup. Something along the lines of:

Chr location Depth/Coverage Forward Reverse A T G C

MT position 1 300 165 135 298 1 0 1

...

...

...

Mt pos 16569 ...

I am also trying to do the same with a whole human genome and a bed file for specific ranges of targets. I am using

samtools depth -a mybam.bam > MTScan.txt

for the mitochondrial analysis and

samtools depth -b mybed.bed mybam.bam > HGScan.txt

for the human genome with bed file analysis and all i get are three columns for reference, position, and reads.

I believe samtools depth or bedcov are what I should use but i cannot seem to get all of the desired information out. Does anyone have any other suggested approaches I might be able to use? Thank you.

sequencing next-gen • 1.4k views
ADD COMMENTlink written 18 months ago by jspainhour310

use the output of

 bcftools mpileup --annotate 'FORMAT/DP4' --regions-file mybed.bed --no-reference in.bam'
ADD REPLYlink modified 18 months ago • written 18 months ago by Pierre Lindenbaum125k

I am getting an error that mpileup does not recognize --annotate. I will keep looking at mpileup.

Thank you!

ADD REPLYlink written 18 months ago by jspainhour310

sorry I was talking about the latest bcftools command

ADD REPLYlink written 18 months ago by Pierre Lindenbaum125k

Thank you! Appreciate the clarification.

ADD REPLYlink written 18 months ago by jspainhour310

I don't have a full answer but I have a comment that might help. Samtools depth does not do what you are asking in a one and done. It reports chr pos depth/coverage.

I think the first thing you could do would be to split your bams into the forward bam, and reverse bam:

Forward bam:

samtools view -F input.bam > forward.bam

Reverse bam:

samtools view -f input.bam > reverse.bam

Then running samtools depth on the forward and reverse, I'll add the -a flag to make sure positions align:

samtools depth -a forward.bam > forwardDepth.txt
samtools depth -a reverse.bam > reverseDepth.txt
paste forwardDepth.txt reverseDepth.txt > forwardAndReverse.txt

This will give you a format of:

chr    pos1    cov    chr    pos1    cov

Since the positions are aligned you probably only want the chr, pos, cov forward, and cov reverse ( I know this can be piped with the previous step):

awk '{print $1, $2, $3, $6}' forwardAndReverse.txt > forRev_cut.txt
sed -i '1s/^/Chr    Position    ForCov    RevCov' forRev_cut.txt

The sed bit adds a header if you're interested. I do not know a tool for getting the base count at each position off of the top of my head, so I cannot add that. This is why I did not post as an answer, but I do believe this is a good start! Once you find a way to do the next step, you could run it on both the forward and reverse reads individually and get an output from that to see if numbers match up.

Dennis

ADD REPLYlink written 18 months ago by drkennetz470

Thank you for the help!

ADD REPLYlink written 18 months ago by jspainhour310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour