Question: htseq-count produces no features amid alignment
gravatar for bojingjia
4.4 years ago by
United States
bojingjia10 wrote:

I recently aligned some sequencing data using STAR against an mm10 prebuilt genome. Afterwards, I sorted and indexed using samtools, and proceeded to generate read counts using htseq-count (and appropriately, an mm10 Ensembl gtf file). But all of my read counts are peculiarly all 0's, classified as no features.

A number of other users have reported the same problem here on BioStars, but their concerns weren't resolved. I have to wonder if my reads failed to align, but a quick look at the bam files in IGV shows many aligned reads. Am I using a faulty mm10 annotation file? Would anyone have suggestions/comments?




ADD COMMENTlink modified 2.9 years ago by Biostar ♦♦ 20 • written 4.4 years ago by bojingjia10


A couple of checks -

  1. Default sorting order expected by HT-Seq in the BAM is name. Most aligners return coord. sorted BAM
  2. You aren't using a GTF file downloaded from UCSC. Last time I checked it had conflict in gene_id with transcript_id values. Read more in the FAQs at the end of this page.
  3. Chr name style is same in your BAM and GTF
ADD REPLYlink modified 5 months ago by RamRS27k • written 4.4 years ago by Amitm1.9k
gravatar for ablanchetcohen
4.4 years ago by
ablanchetcohen1.2k wrote:

Do the sequence names in the BAM file and the GTF file match?

UCSC and Ensembl use different chromosome nomenclatures.

I will never understand why we can sequence the human genome, and put men on the moon, but not agree whether chromosome 1 should be referred to as chr1 or 1.

ADD COMMENTlink written 4.4 years ago by ablanchetcohen1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour