Question

consistent problem of strand-specific information between TopHat/RSeQC/picard

1

Entering edit mode

10.6 years ago

pengchy ▴ 460

The RNAseq strand-specific library was constructed using Illumina's strand specific kit: TruSeq stranded sample prep kits, which is based on dUTP method. As the documentation on illumina (http://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf), the library type should be "fr-firststrand". I have mapped the data using TopHat with library type "fr-firststrand". And then I check the output bam file using picard "CollectRnaSeqMetrics.jar " and "RSeQC-2.6.1/scripts/infer_experiment.py", which give me the results listed below:

First from picard: The first column is from: STRAND=SECOND_READ_TRANSCRIPTION_STRAND, and the second column is from: STRAND=FIRST_READ_TRANSCRIPTION_STRAND. It is surprised me that the SECOND_READ_TRANSCRIPTION_STRAND give more CORRECT_STRAND_READS, just contrary to my expectation.

                                SECOND_READ_TRANSCRIPTION_STRAND    FIRST_READ_TRANSCRIPTION_STRAND
CORRECT_STRAND_READS            14040054                            138566
INCORRECT_STRAND_READS          138566                              14040054

Second from RSeQC:

This is PairEnd Data Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0446
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9554

It seems that picard using different meaning of the first and second as TopHat and Cufflinks, isn't it?

RNA-Seq strand-specific • 4.7k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by pengchy ▴ 460

score 5 · Accepted Answer · 2015-04-24

5

Entering edit mode

10.6 years ago

Devon Ryan 105k

Yes, the strand that's being mentioned is different in tophat/cufflinks than most other things. For tophat/cufflinks, it's the strand from cDNA construction that's being sequenced (i.e., either the first strand that's synthesized (fr-firststrand) or its reverse complement (fr-secondstrand)).

Almost everything else is talking about the DNA strand to which a read/pair aligns. I personally tend to think in terms of this rather than in the steps of library construction, but to each their own.

ADD COMMENT • link 10.6 years ago by Devon Ryan 105k

0

Entering edit mode

Hi Devon Ryan, according to your explanation, the picard's strand information is from DNA, so their strand information are contrary to each other. It make sense. Thank you.

ADD REPLY • link 10.6 years ago by pengchy ▴ 460