Question

Does the order of input files matter when mapping paired-end strand specific RNA-seq reads?

0

Entering edit mode

9.0 years ago

jon.brate ▴ 310

Hi,

We are for the first time mapping stranded RNA-seq paired reads sequenced with the Illumina TruSeq protocol. We are using TopHat2 with the fr-firststrand option and we notice that the direction of reads on the genome seems to depend on the order of the read pair files inputted for mapping. And this will also affect the gene counts later on. It looks that setting R2 reads before R1 gives the correct results. But I can't seem to find this in the TopHat manual.

Are there any standard ways of mapping paired-end stranded RNA-seq reads?

mapping RNA-Seq tophat2 bowtie2 illumina • 2.7k views

ADD COMMENT • link updated 9.0 years ago by Devon Ryan 104k • written 9.0 years ago by jon.brate ▴ 310

score 1 · Answer 1 · 2015-05-14

1

Entering edit mode

9.0 years ago

Devon Ryan 104k

The files for read1 should precede those for read2. Note the description for <[reads1_2,...readsN_2]> in the manual.

ADD COMMENT • link 9.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the reply. But I am still very puzzled about this. When mapping with R1 reads first and counting with HTSeq and stranded option = yes, I get zero counts for features on the + strand. With the stranded option set to reverse I get the counts. When mapping with R2 reads first we get the exactly opposite results. With cufflinks we get fpkm-values almost identical no matter the order of input files.

I have been trying to find more information about how stranded reads are handled by TopHat/cufflinks, but without luck.

ADD REPLY • link 8.9 years ago by jon.brate ▴ 310

0

Entering edit mode

strand=reverse is the typical setting that should be used for stranded datasets with htseq-count, since you presumably have a dUTP-based library. That the "reverse" option is used rather than "yes" has more to do with what the common library types were years ago than anything else.