Hi everybody,
we have a couple of libraries of supposedly directional sequencing for mitochondrial (MT) genome. We would like to see the coverage for each of the strands as we expect the forward strand to have a much higher coverage (due to a higher number of genes on the plus strand).
This is how I tried to separated the strand using the FLAGs from samtools
separate the two strand according to the 0x10 FLAG
samtools view -Sb -t MT.fa.fai -F0x10 A_bowtie.sam -o A_F0x10_forward.bam
samtools view -Sb -t MT.fa.fai -f0x10 A_bowtie.sam -o A_f0x10_reverse.bam
samtools sort A_F0x10_forward.bam A_F0x10_forward_sorted;
samtools index A_F0x10_forward_sorted.bam;
samtools sort A_f0x10_reverse.bam A_f0x10_reverse_sorted;
samtools index A_f0x10_reverse_sorted.bam;
use samtools mpileup to create a tab-del table of read counts.
samtools mpileup -d 10000000 -f MT.fa A_F0x10_forward_sorted.bam > A_F0x10_forward_sorted.pileup;
cut -f 2,3,4 A_F0x10_forward_sorted.mpileup > A_cropped_F0x10_forward_sorted.mpileup;
samtools mpileup -f MT.fa A_f0x10_reverse_sorted.bam > A_f0x10_reverse_sorted.pileup;
cut -f 2,3,4 A_f0x10_reverse_sorted.pileup > A_cropped_f0x10_reverse_sorted.pileup;
Followed that I used R to plot the tables to create images of the coverage of the two strands. Unfortunately this is what I am getting when plotting the two strands :
It looks as if the separation was done somewhat by chance. Does pileup/mpileup do a good job in separating the strands?
What I would basically like to know is, if I am doing everything correctly. Is there a better way to split the bam files to two strand specific bam files?
I would appreciate any help or ideas I can get.
Thanks Assa
Are your data paired-end? And is this RNA or DNA?
it is total RNA, but we are interested only in the MT part of it. Do I need to map it to the genome or the transcriptome?