Question: HTSeq-count paired-end --stranded=yes gives less counts than --stranded=no
0
gravatar for pablominguez
3.0 years ago by
pablominguez10
pablominguez10 wrote:

Hi all,

I'm confused on how should I call htseq-count for my dataset concerning option --stranded [yes or no].

I'm analysing paired-end RNASeq data, at the alignment step with tophat I produced one single bam file from the two reads (R1, R2). So I have one bam file with both strands, this file is sorted by name with samtools and when calling htseq-count I firstly assumed I had to say stranded=yes, I got my counts and there were many zeros, so I run again with stranded=no and got many more read counts, less genes with zeros and genes generally with more counts. These are the numbers for both performances:

stranded=yes no feature = 49 258 937 ambiguous = 8 393

stranded=no no feature = 4 654 795 ambiguous = 1781 755

Since the number of "no feature" is considerably lower in stranded=no, should I use this tag? Is this a proof that this is right call?

So I don't know if my logic is right, I had strand specific data but since I merged them (R1 and R2) in the alignment step, I have everything together in the bam and I have to tell this to htseq-count by setting the -stranded=yes even if this is not completely right, any sense??

As another clue I run infer_experiment.py with the merged-sorted bam file from the RSeQC package and the output was:

This is PairEnd Data Fraction of reads failed to determine: 0.0012 Fraction of reads explained by "1++,1--,2+-,2-+": 0.0827 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9161

This tells me I have strand-specific again ... so maybe I should come back to htseq-count --stranded=yes??

Any help is very welcome!! Thanks a lot!

rna-seq • 2.5k views
ADD COMMENTlink modified 3.0 years ago by Devon Ryan90k • written 3.0 years ago by pablominguez10

If your data is strand specific you should tell it yes for strandness, well, htseq-count default already is for strand=yes. So you dont need to specify. The issue with your too many no_feature reads maybe is related with your alignment or annotation file. Maybe is a good idea try to quantify using RSEM or Kallisto.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tiago2112871.1k

Thanks a lot for the suggestion, I have run Kallisto and the numbers are similar to --stranded=no.

ADD REPLYlink written 3.0 years ago by pablominguez10
1

Try out Salmon, it'll infer your library prep from the raw reads in a handy text file.

ADD REPLYlink written 3.0 years ago by andrew.j.skelton735.7k
3
gravatar for Devon Ryan
3.0 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

You want -s reverse in htseq-count (or -s 2 in featureCounts). That's also what the output from infer_experiment.py is indicating.

ADD COMMENTlink written 3.0 years ago by Devon Ryan90k
0
gravatar for wud
3.0 years ago by
wud0
wud0 wrote:

I have met the same question. both the sequence report and the RSeQC package indicate that I had strand specific data. but when i set the htseq-count --stranded=yes or featurecount -s 1 (means strand speciafic), the number of reads mapping to the ref is lower than setting the parameter with no stranded special.

ADD COMMENTlink written 3.0 years ago by wud0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1423 users visited in the last hour