Question: consistent problem of strand-specific information between TopHat/RSeQC/picard
1
gravatar for pengchy
4.4 years ago by
pengchy410
China/Beijing
pengchy410 wrote:

The RNAseq strand-specific library was constructed using Illumina's strand specific kit: TruSeq stranded sample prep kits, which is based on dUTP method. As the documentation on illumina (http://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf), the library type should be "fr-firststrand". I have mapped the data using TopHat with library type "fr-firststrand". And then I check the output bam file using picard "CollectRnaSeqMetrics.jar " and "RSeQC-2.6.1/scripts/infer_experiment.py", which give me the results listed below:

First from picard: The first column is from: STRAND=SECOND_READ_TRANSCRIPTION_STRAND, and the second column is from: STRAND=FIRST_READ_TRANSCRIPTION_STRAND. It is surprised me that the SECOND_READ_TRANSCRIPTION_STRAND give more CORRECT_STRAND_READS, just contrary to my expectation.

                                  SECOND_READ_TRANSCRIPTION_STRAND FIRST_READ_TRANSCRIPTION_STRAND CORRECT_STRAND_READS            14040054                                                     138566
INCORRECT_STRAND_READS          138566                                                     14040054

Second from RSeQC:

This is PairEnd Data Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0446
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9554

It seems that picard using different meaning of the first and second as TopHat and Cufflinks, isn't it?

rna-seq strand-specific • 2.7k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by pengchy410
5
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Yes, the strand that's being mentioned is different in tophat/cufflinks than most other things. For tophat/cufflinks, it's the strand from cDNA construction that's being sequenced (i.e., either the first strand that's synthesized (fr-firststrand) or its reverse complement (fr-secondstrand)).

Almost everything else is talking about the DNA strand to which a read/pair aligns. I personally tend to think in terms of this rather than in the steps of library construction, but to each their own.

ADD COMMENTlink written 4.4 years ago by Devon Ryan91k

Hi Devon Ryan, according to your explanation, the picard's strand information is from DNA, so their strand information are contrary to each other. It make sense. Thank you.

ADD REPLYlink written 4.4 years ago by pengchy410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1255 users visited in the last hour