Question

How to interpret the difference among these three options in strandedness from HTSeq-count

1

Entering edit mode

9.8 years ago

nalandaatmi ▴ 110

Dear All,

I am interested in calculating the % of reads associated to globin gene and rRNA genes. Right now, I am not sure whether my paired end RNAseq data has followed strand specific protocol or not. I requested the incharge person to inform me.

Meanwhile, I selected all the three options for strandedness (no,yes,reverse) in htseq-count. How do I get the strand (sense,antisense) information? How to interpret the Stranded:Reverse counts?

Globin genes  Stranded:No  Stranded:Yes  Stranded:Reverse
HBB           40204        40197         7
HBA1          38811        38795         16
HBA2          129847       129770        77
HBG1          1566         1566          0
HBG2          2750         2750          0
HBD           3            3             0
HBE1          1            0             1
HBZ           0            0             0
HBQ1          9            3             6
MB            4            0             4
CYGB          294          2             354
NGB           289          2             319

How to interpret the difference among these three options?

Stats from special counters

Special counters         Stranded:No   Stranded:Yes   Stranded:Reverse
__no_feature             56289350      94180089       56914563
__ambiguous              625347        18161          343824
__too_low_aQual          0             0              0
__not_aligned            0             0              0
__alignment_not_unique   30631662      30631662       30631662

RNA-Seq next-gen sequence alignment • 4.5k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by nalandaatmi ▴ 110

Ram · Answer 1 · 2015-10-01

2

Entering edit mode

9.8 years ago

Alternative ▴ 290

Check the following explanation:

http://onetipperday.blogspot.de/2012/07/how-to-tell-which-library-type-to-use.html

Also, RSeqQC have a script that counts the different cases and tells you what type of stranded library you have. Check their infer_experiment.py script (I think I tried it long time ago but don't remember how accurate it is but should be ok).

http://rseqc.sourceforge.net/#infer-experiment-py

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by Alternative ▴ 290

1

Entering edit mode

I always like to see how things look like on IGV. I recommend loading the tracks and check as Noolean proposed but additionally, I would take a couple of small transcripts (with few reads) and check if the counts on IGV match with those of HTSeq.

Also, as Antonio said too, the person that generated the library has to give this information.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by Alternative ▴ 290

Ram · Answer 2 · 2015-09-29

Take your file with mapped reads (bam or sam) and open it in IGV. Then right click and choose coloring by first-in-pair read strand. If all genes are colored with the same color (either pink or blue), your protocol is strand-specific. Each gene will be colored based on the fact if the transcription is sense or antisense compared to the reference. If your protocol is not strand-specific, you will se mix of both colors.

Ram · Answer 3 · 2015-09-29

0

Entering edit mode

9.8 years ago

Antonio R. Franco ★ 5.2k

One will know when your library has been constructed stranded or not.. You must purchase and use specific reagents and follow a determined protocol

Don't you have that information ?

If you know that your library is stranded, you should give that information to your mapping program and to HTSeq-Count as well

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

Dear Noolean/Pierre/Antonio, I received the information now. The RNASeq library has been constructed based on strand specific protocol.

Sure, I will check the bam file using IGV and validate the counts provided by htseq-count.

But to get bam files, I used tophat for alignment step and it produced the accepted_hits.bam file. Using the following command for tophat

tophat -p 6 -o $outdir $bowtie_index $fastq_r1 $fastq_r2

I didn't mention any library-type. But I got the information now that my RNAseq is based on strand specific protocol.

Which below option should I need to select?

I believe, that now I should rerun my tophat with strand information as Antonio mentioned.

Below content from following link (https://ccb.jhu.edu/software/tophat/manual.shtml#toph):

--library-type The default is unstranded (fr-unstranded). If either fr-firststrand or fr-secondstrand is specified, every read alignment will have an XS attribute tag as explained below. Consider supplying library type options below to select the correct RNA-seq protocol.

Library Type     Examples                  Description
fr-unstranded    Standard Illumina         Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststrand   dUTP, NSR, NNSR           Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrand  Ligation, Standard SOLiD  Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.8 years ago by nalandaatmi ▴ 110