Question

HT-seq Counts Very High Number of No Feature

0

Entering edit mode

7.8 years ago

Krish Srinivasan ▴ 50

Hi All,

I am running a test run on HT-seq counts on a data set previously used by a member of my lab, using the galaxy tool. I am getting an unusually high amount of No Feature results. In the raw data, there are approximately 26 million forward reads and 26 million reverse. The model is Zebrafish, reference genome GRCz10.

Parameters:

Mode = Union,
Stranded = Yes (The experiment was strand specific) ,
Minimum Alignment quality = 10,
Feature type = Exon,
ID Attribute = gene_id,
Additional Bam = No,
Force Sorting of BAM by name = Yes

The results of HT-seq Counts are:

no_feature          22001714,
ambiguous   13105,
too_low_aQual   0,
not_aligned 0,
alignment_not_unique    19024306,

Can anyone explain why the majority of the reads are reading as No Feature? Is this normal and can be explained by the use of the zebrafish or is something going wrong?

Thank you,

Krishna

RNA-Seq HT-seq Counts • 5.2k views

ADD COMMENT • link updated 7.8 years ago by GouthamAtla 12k • written 7.8 years ago by Krish Srinivasan ▴ 50

GouthamAtla · Accepted Answer · 2017-01-03

4

Entering edit mode

7.8 years ago

GouthamAtla 12k

Are there any reads that are assigned to genes ? One of the problem could be chromosome names ( chr1 vs 1 ). The other problem would be strandedness. Did you try with 'reverse' ? You should get the information about the strandedness from the sequencing people.

ADD COMMENT • link 7.8 years ago by GouthamAtla 12k

0

Entering edit mode

HI Goutham,

There are reads that are assigned to genes in the output, so I don't think there is a problem with chromosome names. Also I have not tried the reverse parameter for strandedness, I will give that a try and see what happens. Thank you.

ADD REPLY • link 7.8 years ago by Krish Srinivasan ▴ 50

0

Entering edit mode

So I tried using reverse and here are my results:

no_feature          3153429
ambiguous   166782
too_low_aQual   0
not_aligned 0
alignment_not_unique    19024306

This seems to be better than before? Also there are reads that are being assigned to genes (many more actually).

ADD REPLY • link updated 7.8 years ago by GouthamAtla 12k • written 7.8 years ago by Krish Srinivasan ▴ 50

1

Entering edit mode

It looks better but the use of reverse or yes should not depend on better results. You should ask the people who did the sequencing and use appropriately.

ADD REPLY • link 7.8 years ago by GouthamAtla 12k

1

Entering edit mode

It looks better but the use of reverse or yes should not depend on better results. You should ask the people who did the sequencing and use appropriately.

If you have a stranded protocol, it will make a world of difference. That's what stranded protocols are for, to separate the sense "signal" from antisense "noise". Changing the stranded value in htseq-count will cause it to quantitate signal, or quantitate noise.

ADD REPLY • link 7.7 years ago by apa@stowers ▴ 600

0

Entering edit mode

Okay, Thanks for all the help. I will contact the person.

ADD REPLY • link 7.8 years ago by Krish Srinivasan ▴ 50

0

Entering edit mode

Hey Goutham,

So I found out more information about the library. It is a pair end, strand specific assay using NEBNext First Strand Synthesis Module and NEBNext Second Strand Synthesis Module (with dUTP). My biggest question is now, in HT-seq counts do I use the stranded= Yes or Stranded = Reverse option.

ADD REPLY • link 7.7 years ago by Krish Srinivasan ▴ 50

2

Entering edit mode

If you are using the Illumina TruSeq stranded protocol, or any other first-strand-based protocol, then your alignments will always be annotated to the wrong strand. Unless you aligned the reads with Tophat using "--library-type fr-firststrand", which corrects this. Otherwise, you could edit the strand flag in the BAM, which is the best bet.

If your BAM is not corrected, then all strandwise operations will have to be reversed. HTseq-count will need "-s reverse" to get sense counts, bedtools will require "-S" instead of "-s", etc.

ADD REPLY • link 7.7 years ago by apa@stowers ▴ 600

0

Entering edit mode

Thank you!! This is what i was looking for. I am rerunning tophat with fr- first strand and then using HT-seq counts and stranded=yes.

ADD REPLY • link 7.7 years ago by Krish Srinivasan ▴ 50

2

Entering edit mode

Ah hold on, I was only partially right. Even with "--library-type fr-firststrand" I think the strand will still be reversed in the bit flags. Tophat will just add the "XS" tag for strand -- which will contradict the bit flags -- but only Cufflinks reads this tag. Other software will still use the bit flags, and still get things backwards.

So you could edit the flags, or, just remember to run htseq / bedtools / etc with the "wrong" strand argument.

ADD REPLY • link 7.7 years ago by apa@stowers ▴ 600

0

Entering edit mode

thanks for the update. I ended up running Ht-seq count as reverse.

ADD REPLY • link 7.7 years ago by Krish Srinivasan ▴ 50

0

Entering edit mode

Accept the answer if it helped. So that it will be a "closed" thread.

ADD REPLY • link 7.7 years ago by GouthamAtla 12k