Question: HT-seq Counts Very High Number of No Feature
0
gravatar for Krish Srinivasan
2.4 years ago by
Canada
Krish Srinivasan10 wrote:

Hi All,

I am running a test run on HT-seq counts on a data set previously used by a member of my lab, using the galaxy tool. I am getting an unusually high amount of No Feature results. In the raw data, there are approximately 26 million forward reads and 26 million reverse. The model is Zebrafish, reference genome GRCz10.

Parameters:

Mode = Union,
Stranded = Yes (The experiment was strand specific) ,
Minimum Alignment quality = 10,
Feature type = Exon,
ID Attribute = gene_id,
Additional Bam = No,
Force Sorting of BAM by name = Yes

The results of HT-seq Counts are:

no_feature          22001714,
ambiguous   13105,
too_low_aQual   0,
not_aligned 0,
alignment_not_unique    19024306,

Can anyone explain why the majority of the reads are reading as No Feature? Is this normal and can be explained by the use of the zebrafish or is something going wrong?

Thank you,

Krishna

rna-seq ht-seq counts • 1.6k views
ADD COMMENTlink modified 2.4 years ago by geek_y9.7k • written 2.4 years ago by Krish Srinivasan10
3
gravatar for geek_y
2.4 years ago by
geek_y9.7k
Barcelona/CRG/London/Imperial
geek_y9.7k wrote:

Are there any reads that are assigned to genes ? One of the problem could be chromosome names ( chr1 vs 1 ). The other problem would be strandedness. Did you try with 'reverse' ? You should get the information about the strandedness from the sequencing people.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by geek_y9.7k

HI Goutham,

There are reads that are assigned to genes in the output, so I don't think there is a problem with chromosome names. Also I have not tried the reverse parameter for strandedness, I will give that a try and see what happens. Thank you.

ADD REPLYlink written 2.4 years ago by Krish Srinivasan10

So I tried using reverse and here are my results:

no_feature          3153429
ambiguous   166782
too_low_aQual   0
not_aligned 0
alignment_not_unique    19024306

This seems to be better than before? Also there are reads that are being assigned to genes (many more actually).

ADD REPLYlink modified 2.4 years ago by geek_y9.7k • written 2.4 years ago by Krish Srinivasan10
1

It looks better but the use of reverse or yes should not depend on better results. You should ask the people who did the sequencing and use appropriately.

ADD REPLYlink written 2.4 years ago by geek_y9.7k
1

It looks better but the use of reverse or yes should not depend on better results. You should ask the people who did the sequencing and use appropriately.

If you have a stranded protocol, it will make a world of difference. That's what stranded protocols are for, to separate the sense "signal" from antisense "noise". Changing the stranded value in htseq-count will cause it to quantitate signal, or quantitate noise.

ADD REPLYlink written 2.4 years ago by apa@stowers420

Okay, Thanks for all the help. I will contact the person.

ADD REPLYlink written 2.4 years ago by Krish Srinivasan10

Hey Goutham,

So I found out more information about the library. It is a pair end, strand specific assay using NEBNext First Strand Synthesis Module and NEBNext Second Strand Synthesis Module (with dUTP). My biggest question is now, in HT-seq counts do I use the stranded= Yes or Stranded = Reverse option.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Krish Srinivasan10
1

If you are using the Illumina TruSeq stranded protocol, or any other first-strand-based protocol, then your alignments will always be annotated to the wrong strand. Unless you aligned the reads with Tophat using "--library-type fr-firststrand", which corrects this. Otherwise, you could edit the strand flag in the BAM, which is the best bet.

If your BAM is not corrected, then all strandwise operations will have to be reversed. HTseq-count will need "-s reverse" to get sense counts, bedtools will require "-S" instead of "-s", etc.

ADD REPLYlink written 2.4 years ago by apa@stowers420

Thank you!! This is what i was looking for. I am rerunning tophat with fr- first strand and then using HT-seq counts and stranded=yes.

ADD REPLYlink written 2.4 years ago by Krish Srinivasan10
1

Ah hold on, I was only partially right. Even with "--library-type fr-firststrand" I think the strand will still be reversed in the bit flags. Tophat will just add the "XS" tag for strand -- which will contradict the bit flags -- but only Cufflinks reads this tag. Other software will still use the bit flags, and still get things backwards.

So you could edit the flags, or, just remember to run htseq / bedtools / etc with the "wrong" strand argument.

ADD REPLYlink written 2.4 years ago by apa@stowers420

thanks for the update. I ended up running Ht-seq count as reverse.

ADD REPLYlink written 2.4 years ago by Krish Srinivasan10

Accept the answer if it helped. So that it will be a "closed" thread.

ADD REPLYlink written 2.4 years ago by geek_y9.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1392 users visited in the last hour