Question: HTSeq-count no feature is too high
0
gravatar for sandeepmarla
3.5 years ago by
United States
sandeepmarla0 wrote:

Hello,

I'm using HTSeq-count to determine the number of transcripts for each sample. I have bam files,samtools flagstat analysis showed ~80% of the transcripts are aligned to the reference genome. Then I sorted those bam files by name and position, used the exon annotation gff3 file from sorghum phytozome v3.1 to determine the transcript counts using HTSeq-count. After analyzing my data, I realized no_feature was too high, I have a total of ~30 -36 million paired end reads and 15 million shows no_feature.

code: htseq-count --stranded=no --type=exon --idattr=Parent --order=pos --format=bam --minaqual=30 --mode=union $bam $gff3 >$out

 __no_feature 15928275
__ambiguous 3020434
__too_low_aQual 768292
__not_aligned 5910873
__alignment_not_unique 1756250

Further, I looked at the data and found that all the featured were on 6 chromosomes and 4 (chr3,4,6 and 7) were completely absent in the HTSeq counts data. I looked at the gff3 file and found that exon data was present for all the chromosomes.

Can any please help me in fixing this issue?

Thanks in advance,

Sandeep

 

 

 

myposts • 2.7k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by sandeepmarla0

That's certainly the weirdest htseq-count issue I've heard of. Perhaps featureCounts will produce better results? It's much faster anyway.

ADD REPLYlink written 3.5 years ago by Devon Ryan90k

Are the chromosome names exactly the same between the gff file and the bam file ? It's the only reason I can think of to explain this issue...

ADD REPLYlink written 3.5 years ago by Carlo Yague4.5k
0
gravatar for sandeepmarla
3.5 years ago by
United States
sandeepmarla0 wrote:

Thanks for your responses. The problem is with the chromosome assembly in the reference genome sequence file, the newest version of the genome missed the chromosomes labeling. I'm re-doing the complete analysis with an updated assembly reference genome.

Lesson's learnt from this experience, we need to make sure the analysis is correct at each step: 1) Check assembled genome if chromosome annotation is missing. 2) If sam and bam files generated contained all the chromosomes in the analysis. 3) Compare the gff3/gtf annotation file to validate if assembled chromosome labeling matches the annotation file. 4). If HTSeq-counts are too high, read the HTSeq manual to ascertain the used feature, mode and idattr match the data utilized for the analysis.

 

 

ADD COMMENTlink written 3.5 years ago by sandeepmarla0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1033 users visited in the last hour