Question: Htseq-Count Not Giving Counts For Mirna Alignment To Mirbase
gravatar for Preethy Nair
5.2 years ago by
University of Helsinki, Finland
Preethy Nair0 wrote:


I was trying to get the counts of miRNAs using htseq-count. The sam file is generated from adapter trimmed miRNA-seq alignment (bowtie) against miRBase mature miRNAs.

The sam file look like this:

HISEQ2000:383:C2452ACXX:1:2316:21334:99296    0    hsa-miR-30c-5p    1    255    23M    *    0    0    TGTAAACATCCTACACTCTCAGC    @@@DDDD>FFHFBH<C:FB9EE:    XA:i:0    MD:Z:23    NM:i:0
HISEQ2000:383:C2452ACXX:1:2316:21266:99322    0    hsa-miR-486-5p    1    255    22M    *    0    0    TCCTGTACTGAGCTGCCCCGAG    @C@FFFDFFHAHHIIIIIIEED    XA:i:0    MD:Z:22    NM:i:0

The reference fasta file used to create bowtie index:


Gtf file used :

hsa-mir-6859-1    .    miRNA_primary_transcript    17369    17436    .    -    .    ID "MI0022705"; Alias "MI0022705"; Name "hsa-mir-6859-1"
hsa-miR-6859-5p    .    miRNA    17409    17431    .    -    .    ID "MIMAT0027618"; Alias "MIMAT0027618"; Name "hsa-miR-6859-5p"; Derives_from "MI0022705"

I am not getting any counts for miRNAs based on these files with htseq-count. At the same time, I can see from the sam file that there are mappings to miRNAs.
htseq command used:

htseq-count --mode=intersection-strict --stranded=no --type=miRNA --idattr=Name  eg.sam  hsa.gff3

Could anyone advise me whether there is anything wrong in this approach or in the files I have used to get the counts ?


ADD COMMENTlink modified 5.2 years ago by Jeremy Leipzig18k • written 5.2 years ago by Preethy Nair0

As sanders.muc noted, it looks like your annotation won't match the alignments. How long is the "hsa-miR-6859-5p" transcript in the fasta file? If it's less than ~17.4kb then this will never work. Why don't you just look at how many reads map to each contig in the resulting BAM (look at samtools idxstats)? Presuming you first filtered out multimappers, that would seem to work.

ADD REPLYlink written 5.2 years ago by Devon Ryan89k
gravatar for Jeremy Leipzig
5.2 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

You don't really need HT-Seq to do this since there are no ranges involved

hairpins <- read.DNAStringSet(paste(hiseqDataDir,"refs/hairpin.fa",sep=""),use.names=TRUE)
ADD COMMENTlink written 5.2 years ago by Jeremy Leipzig18k

Thanks a lot. 

ADD REPLYlink written 5.0 years ago by Preethy Nair0

Hi, I have the same problem and I tried every step that you write. I also have an index of my bam sequence. However in the last one I have this error:

> bamcounts <- countBam(bamView)
Error in .BamViews_delegate("countBam", file, fun, ..., param = param) :
  'countBam' failed on ''
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  space 'hsa-let-7a-1' not in BAM header
2: RemoteWarning
  elements: 1
  UnspecifiedError: value[[3L]](cond)
'countBam' failed
  file: /home2/result/ann/mapped/SRR433866
  index: /home2/result/ann/mapped/SRR433866.bam 
ADD REPLYlink written 4.5 years ago by Kato50

usually the index would be called SRR433866.bam.bai. Show us

samtools view -H /home2/result/ann/mapped/SRR433866.bam

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Jeremy Leipzig18k

Thanks for the script


ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by vijay.bhaaskarla0
gravatar for sanders.muc
5.2 years ago by
sanders.muc10 wrote:

I don't understand your GFF file: What to the coordinate intervals mean? Is this the position of the miRNA on the chromosome?

Also, note that htseq-count is meant to count reads mapped to the genome. You have mapped, however, to a reference which lists only the miRNA transcriptome and each miRNA as a separate sequence. Hence, you could just look at the 3rd column of your SAM file and count based on this.

ADD COMMENTlink written 5.2 years ago by sanders.muc10
gravatar for liran0921
5.2 years ago by
liran0921100 wrote:

I suppose your gtf file is downloaded from Mirbase. Actually the coordinates in this file is based on the alignment with bovine genome. So you should align your reads file with the same bovine genome if you want to use this gtf file for read counting.

ADD COMMENTlink written 5.2 years ago by liran0921100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1615 users visited in the last hour