Question: Does better results in HTSeq-Count mean that the script was run correctly?
1
gravatar for cr.balladares.m
4.2 years ago by
cr.balladares.m10 wrote:

I tested two different options while running HTSeq-Count, -s no and -s reverse. This are the results:

For -s no:

__no_feature 435592

__ambiguous 953159

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

For -s reverse:

__no_feature 573728

__ambiguous 410510

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

For the option -s reverse there are lower ambiguous values but higher no_feature than for -s no. As far as I know this option depends on the construction of the library, but when they gave me this sequences they didn't mention it. All I know is that it's was constructed under a Illumina protocol and that it was a Paired-End experiment of RNA-Seq from peach (Prunus persica). I'm inclined to think that less ambiguous values are just better, even than with more no_feature values.

So, which one it's right?

===============

Edit: This are the results from -s yes:

__no_feature 41467373

__ambiguous 506

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

rna-seq • 1.8k views
ADD COMMENTlink modified 4.2 years ago by Devon Ryan96k • written 4.2 years ago by cr.balladares.m10

Did you map the reads to the genome or transcriptome?

ADD REPLYlink written 4.2 years ago by Asaf8.2k

it was mapped to the genome using tophat2

ADD REPLYlink written 4.2 years ago by cr.balladares.m10
1

So you should use -s yes. The differences you show are expected even with random reads

ADD REPLYlink written 4.2 years ago by Asaf8.2k
1

That's even not something he tested and will for sure depend on the protocol. You can have stranded or unstranded RNA seq.

ADD REPLYlink written 4.2 years ago by WouterDeCoster44k

This are the results from -s yes:

__no_feature 41467373

__ambiguous 506

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

ADD REPLYlink written 4.2 years ago by cr.balladares.m10

It looks like it's stranded but as written below, you have to make sure which protocol was used.

ADD REPLYlink written 4.2 years ago by Asaf8.2k
1

A huge number now ends up in no_feature which argues against -s yes. My guess is non-stranded, in which it makes sense that there are more __ambiguous reads (which can not be assigned since htseq-count has no strand information available). ambiguous means that the read can be from either geneA or geneB, without strand information impossible to say in the case of antisense transcripts.

ADD REPLYlink written 4.2 years ago by WouterDeCoster44k

I missed a digit :) In that case it's probably reverse since most of the reads do map in reverse mode

ADD REPLYlink written 4.2 years ago by Asaf8.2k
6
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

This is a pretty common method to determine the strandedness of a library and is essentially what RSeQC is doing. In this case, -s reverse is the correct setting (it's also the majority of what's produced these days). The general reasoning is:

  1. If it's an unstranded library, then each of the stranded methods will have ~2x more _no_feature counts than the -s no setting.
  2. If it's a stranded library, one of the stranded setting will have "slightly" higher _no_feature counts (because reality is annoying like that) and the other will have vastly higher _no_feature counts.
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Devon Ryan96k

replicate1 __no_feature 39795601 __ambiguous 73269 __too_low_aQual 2553578 __not_aligned 822830 __alignment_not_unique 6895933 Is this result normal? I used HIAST to do alignment. Because my sequencing data is Sanger/Illumina1.9, so I only set the quality value as Phred 33 and run other parameters as default. My alignment rate is about 90%. However, the HTseq result is quite unexpected. I have no clue where went wrong.

ADD REPLYlink written 2.9 years ago by sophialovechan50

It is a stranded sequencing data, when I run HTseq, I didn't set -s argument since the default setting is yes.

ADD REPLYlink written 2.9 years ago by sophialovechan50
0
gravatar for WouterDeCoster
4.2 years ago by
Belgium
WouterDeCoster44k wrote:

This seems a dangerous assumption and in my opinion not valid. Try to find out which kit was used exactly, unless conclusions could be very very wrong.

ADD COMMENTlink written 4.2 years ago by WouterDeCoster44k
2

I would use RSeQC's infer_experiment.py in order to get an estimate of the library's strandedness. In case that you have no information about the kit.

ADD REPLYlink written 4.2 years ago by michael.ante3.6k

I tried that but I dont know where to find a gene model for prunus persica

ADD REPLYlink written 4.2 years ago by cr.balladares.m10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1037 users visited in the last hour