Question

How To Separate Strands In Sequencing Data Obtained Via Directional Rna-Seq Library Preparation

1

Entering edit mode

11.1 years ago

Assa Yeroslaviz ★ 1.8k

Hi everybody,

we have a couple of libraries of supposedly directional sequencing for mitochondrial (MT) genome. We would like to see the coverage for each of the strands as we expect the forward strand to have a much higher coverage (due to a higher number of genes on the plus strand).

This is how I tried to separated the strand using the FLAGs from samtools

separate the two strand according to the 0x10 FLAG

samtools view -Sb -t MT.fa.fai -F0x10 A_bowtie.sam -o A_F0x10_forward.bam
samtools view -Sb -t MT.fa.fai -f0x10  A_bowtie.sam -o A_f0x10_reverse.bam
samtools sort A_F0x10_forward.bam A_F0x10_forward_sorted;
samtools index A_F0x10_forward_sorted.bam;
samtools sort A_f0x10_reverse.bam A_f0x10_reverse_sorted;
samtools index A_f0x10_reverse_sorted.bam;

use samtools mpileup to create a tab-del table of read counts.

samtools mpileup -d 10000000 -f MT.fa  A_F0x10_forward_sorted.bam > A_F0x10_forward_sorted.pileup;
cut -f 2,3,4  A_F0x10_forward_sorted.mpileup >  A_cropped_F0x10_forward_sorted.mpileup;
samtools mpileup -f MT.fa  A_f0x10_reverse_sorted.bam > A_f0x10_reverse_sorted.pileup;
cut -f 2,3,4 A_f0x10_reverse_sorted.pileup > A_cropped_f0x10_reverse_sorted.pileup;

Followed that I used R to plot the tables to create images of the coverage of the two strands. Unfortunately this is what I am getting when plotting the two strands : plotA

It looks as if the separation was done somewhat by chance. Does pileup/mpileup do a good job in separating the strands?

What I would basically like to know is, if I am doing everything correctly. Is there a better way to split the bam files to two strand specific bam files?

I would appreciate any help or ideas I can get.

Thanks Assa

bam strand split samtools picard • 8.3k views

ADD COMMENT • link updated 11.1 years ago by Istvan Albert 100k • written 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Are your data paired-end? And is this RNA or DNA?

ADD REPLY • link 11.1 years ago by Sean Davis 26k

0

Entering edit mode

it is total RNA, but we are interested only in the MT part of it. Do I need to map it to the genome or the transcriptome?

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

score 2 · Answer 1 · 2013-03-12

2

Entering edit mode

11.1 years ago

Istvan Albert 100k

I would refer you to this: the difference between directional and nondirectional sequencing

Basically you will need to separate by strand before the mapping not after that. Otherwise you are processing the data as a normal two-stranded fragments, and those indeed will produce traces that are shifted by fragment size.

ADD COMMENT • link 11.1 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks. I have written the post for this answer a few months ago. How do I separate the file to different strands before I amp them? The fastq files don't have any information about it.

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

according to our sequencing center we d have here sets of paired-end directional sequencing data. I am not really asking about the method, as I have to believe, that they did they job according to protocol. My problem is that I can't see any difference in the mapping of the data. I am questioning my way of analyzing the data.

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

In that case I believe that you'll need to keep only the forward strand data for the reads that come from the forward strand and vice versa. That is to say that the information of which reads comes from which strand is already present before mapping.

ADD REPLY • link 11.1 years ago by Istvan Albert 100k

0

Entering edit mode

Yes I get that, but how? Where do I get that information from the fastq file? I don't have the adapter and I don't have the mapping information.

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

After mapping you can only get the information for the known annotations. It will be a somewhat complicated process, off the top of my head I think it will look like this:

Separate your exons by strand, then take your reads that were already separated by strand and using say bedtools intersect the reads on the given strand with the exons on that same strand. This will produce a subset of reads that match the expected orientation.

ADD REPLY • link 11.1 years ago by Istvan Albert 100k

score 1 · Answer 2 · 2013-03-12

1

Entering edit mode

11.1 years ago

Sean Davis 26k

You might try a dedicated RNA-seq aligner such as gsnap or STAR in stranded mode, aligning to the genome and using the Gencode annotations. Using those results, try RNA-SeQC to quantify the details.

ADD COMMENT • link 11.1 years ago by Sean Davis 26k

0

Entering edit mode

First of all thanks for the new hints. I will try them and see if it is better. Secondly, what is wrong with exotic? and why is it a non-standard analysis? What is about using bowtie -> samtools -> R so non-standard? Weren't the FLAG from samtools invented exactly for this kind of problem

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

I have edited my post to be less subjective.

ADD REPLY • link 11.1 years ago by Sean Davis 26k

0

Entering edit mode

Sorry. I didn't mean that in an offending way. I Just would like to know why you think, the way I did it is not as good as your proposal. no offense taken :-)

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

I know. However, I did make some subjective points that were not particularly helpful. All I meant is that there are tools that have been developed to address the problem you seem to be trying to solve and it might be worth trying those as well as your own approaches.

And I didn't take your comment in any negative way : ).

ADD REPLY • link 11.1 years ago by Sean Davis 26k

score 0 · Answer 3 · 2013-03-11

0

Entering edit mode

11.1 years ago

swbarnes2 14k

we have a couple of libraries of supposedly directional sequencing for mitochondrial (MT) genome.

That makes no sense. Do you mean directional mitochondrial RNA?

ADD COMMENT • link 11.1 years ago by swbarnes2 14k

0

Entering edit mode

not exactly. It was total RNA, but we are interested only in the MT reads, so I needed to map it to the MT. Why doesn't it make sense?

ADD REPLY • link 11.1 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Step one to getting good answers is to correctly describe what you are doing.

When you say you sequenced a "genome", and never mention RNA, that usually means a DNA library, which, isn't directional, and wouldn't be enriched in any way for sequence in the same direction that the genes run. So when you say that things aren't working, and your terminology is suggests that you might be doing something very silly that would never work, that has to be squared away before anyone can give you help.

The flags in the .bam will tell you whether a read is read 1 or read 2 (Read 1 has a 64 flag, read 2 has a 128). Try splitting your reads by that, and you might see the directionality you want.

ADD REPLY • link 11.1 years ago by swbarnes2 14k