Htseq-Count Not Giving Counts For Mirna Alignment To Mirbase
3
0
Entering edit mode
7.1 years ago

Hi,

I was trying to get the counts of miRNAs using htseq-count. The sam file is generated from adapter trimmed miRNA-seq alignment (bowtie) against miRBase mature miRNAs.

The sam file look like this:

HISEQ2000:383:C2452ACXX:1:2316:21334:99296    0    hsa-miR-30c-5p    1    255    23M    *    0    0    TGTAAACATCCTACACTCTCAGC    @@@DDDD>FFHFBH<C:FB9EE:    XA:i:0    MD:Z:23    NM:i:0
HISEQ2000:383:C2452ACXX:1:2316:21266:99322    0    hsa-miR-486-5p    1    255    22M    *    0    0    TCCTGTACTGAGCTGCCCCGAG    @C@FFFDFFHAHHIIIIIIEED    XA:i:0    MD:Z:22    NM:i:0


The reference fasta file used to create bowtie index:

>hsa-let-7a-5p
TGAGGTAGTAGGTTGTATAGTT
>hsa-let-7a-3p
CTATACAATCTACTGTCTTTC


Gtf file used :

hsa-mir-6859-1    .    miRNA_primary_transcript    17369    17436    .    -    .    ID "MI0022705"; Alias "MI0022705"; Name "hsa-mir-6859-1"
hsa-miR-6859-5p    .    miRNA    17409    17431    .    -    .    ID "MIMAT0027618"; Alias "MIMAT0027618"; Name "hsa-miR-6859-5p"; Derives_from "MI0022705"


I am not getting any counts for miRNAs based on these files with htseq-count. At the same time, I can see from the sam file that there are mappings to miRNAs.
htseq command used:

htseq-count --mode=intersection-strict --stranded=no --type=miRNA --idattr=Name  eg.sam  hsa.gff3


Could anyone advise me whether there is anything wrong in this approach or in the files I have used to get the counts ?

Thanks

• 5.7k views
1
Entering edit mode

As sanders.muc noted, it looks like your annotation won't match the alignments. How long is the "hsa-miR-6859-5p" transcript in the fasta file? If it's less than ~17.4kb then this will never work. Why don't you just look at how many reads map to each contig in the resulting BAM (look at samtools idxstats)? Presuming you first filtered out multimappers, that would seem to work.

4
Entering edit mode
7.1 years ago

You don't really need HT-Seq to do this since there are no ranges involved

hairpins <- read.DNAStringSet(paste(hiseqDataDir,"refs/hairpin.fa",sep=""),use.names=TRUE)
names(hairpins)<-str_match(names(hairpins),"^\\S+")
hairpinGR<-GRanges(names(hairpins),IRanges(1,width(hairpins)),strand="*")
bamView<-BamViews(bamPaths=bamPaths,bamSamples=bamSamples,bamRanges=hairpinGR)
bamcounts<-countBam(bamView)

0
Entering edit mode

Thanks a lot.

0
Entering edit mode

Hi, I have the same problem and I tried every step that you write. I also have an index of my bam sequence. However in the last one I have this error:

> bamcounts <- countBam(bamView)
Error in .BamViews_delegate("countBam", file, fun, ..., param = param) :
'countBam' failed on ''
1: In doTryCatch(return(expr), name, parentenv, handler) :
space 'hsa-let-7a-1' not in BAM header
2: RemoteWarning
elements: 1
UnspecifiedError: value[[3L]](cond)
'countBam' failed
file: /home2/result/ann/mapped/SRR433866
index: /home2/result/ann/mapped/SRR433866.bam

0
Entering edit mode

usually the index would be called SRR433866.bam.bai. Show us

samtools view -H /home2/result/ann/mapped/SRR433866.bam

0
Entering edit mode

Thanks for the script

Vijay

1
Entering edit mode
7.1 years ago
Simon Anders ▴ 30

I don't understand your GFF file: What to the coordinate intervals mean? Is this the position of the miRNA on the chromosome?

Also, note that htseq-count is meant to count reads mapped to the genome. You have mapped, however, to a reference which lists only the miRNA transcriptome and each miRNA as a separate sequence. Hence, you could just look at the 3rd column of your SAM file and count based on this.

0
Entering edit mode
7.1 years ago
liran0921 ▴ 110

I suppose your gtf file is downloaded from Mirbase. Actually the coordinates in this file is based on the alignment with bovine genome. So you should align your reads file with the same bovine genome if you want to use this gtf file for read counting.