The RNAseq strand-specific library was constructed using Illumina's strand specific kit: TruSeq stranded sample prep kits, which is based on dUTP method. As the documentation on illumina (http://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf), the library type should be "fr-firststrand". I have mapped the data using TopHat with library type "fr-firststrand". And then I check the output bam file using picard "CollectRnaSeqMetrics.jar " and "RSeQC-2.6.1/scripts/infer_experiment.py", which give me the results listed below:
First from picard: The first column is from: STRAND=SECOND_READ_TRANSCRIPTION_STRAND
, and the second column is from: STRAND=FIRST_READ_TRANSCRIPTION_STRAND
. It is surprised me that the SECOND_READ_TRANSCRIPTION_STRAND
give more CORRECT_STRAND_READS
, just contrary to my expectation.
SECOND_READ_TRANSCRIPTION_STRAND FIRST_READ_TRANSCRIPTION_STRAND
CORRECT_STRAND_READS 14040054 138566
INCORRECT_STRAND_READS 138566 14040054
Second from RSeQC:
This is PairEnd Data Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0446
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9554
It seems that picard using different meaning of the first and second as TopHat and Cufflinks, isn't it?
Hi Devon Ryan, according to your explanation, the picard's strand information is from DNA, so their strand information are contrary to each other. It make sense. Thank you.