Question: Htseq-Count Not Giving Counts For Mirna Alignment To Mirbase
0
gravatar for Preethy Nair
5.6 years ago by
University of Helsinki, Finland
Preethy Nair0 wrote:

Hi,

I was trying to get the counts of miRNAs using htseq-count. The sam file is generated from adapter trimmed miRNA-seq alignment (bowtie) against miRBase mature miRNAs.

The sam file look like this:

HISEQ2000:383:C2452ACXX:1:2316:21334:99296    0    hsa-miR-30c-5p    1    255    23M    *    0    0    TGTAAACATCCTACACTCTCAGC    @@@DDDD>FFHFBH<C:FB9EE:    XA:i:0    MD:Z:23    NM:i:0
HISEQ2000:383:C2452ACXX:1:2316:21266:99322    0    hsa-miR-486-5p    1    255    22M    *    0    0    TCCTGTACTGAGCTGCCCCGAG    @C@FFFDFFHAHHIIIIIIEED    XA:i:0    MD:Z:22    NM:i:0

The reference fasta file used to create bowtie index:

>hsa-let-7a-5p
TGAGGTAGTAGGTTGTATAGTT
>hsa-let-7a-3p
CTATACAATCTACTGTCTTTC

Gtf file used :

hsa-mir-6859-1    .    miRNA_primary_transcript    17369    17436    .    -    .    ID "MI0022705"; Alias "MI0022705"; Name "hsa-mir-6859-1"
hsa-miR-6859-5p    .    miRNA    17409    17431    .    -    .    ID "MIMAT0027618"; Alias "MIMAT0027618"; Name "hsa-miR-6859-5p"; Derives_from "MI0022705"

I am not getting any counts for miRNAs based on these files with htseq-count. At the same time, I can see from the sam file that there are mappings to miRNAs.
htseq command used:

htseq-count --mode=intersection-strict --stranded=no --type=miRNA --idattr=Name  eg.sam  hsa.gff3

Could anyone advise me whether there is anything wrong in this approach or in the files I have used to get the counts ?

Thanks

• 4.8k views
ADD COMMENTlink modified 5.6 years ago by Jeremy Leipzig18k • written 5.6 years ago by Preethy Nair0
1

As sanders.muc noted, it looks like your annotation won't match the alignments. How long is the "hsa-miR-6859-5p" transcript in the fasta file? If it's less than ~17.4kb then this will never work. Why don't you just look at how many reads map to each contig in the resulting BAM (look at samtools idxstats)? Presuming you first filtered out multimappers, that would seem to work.

ADD REPLYlink written 5.6 years ago by Devon Ryan91k
4
gravatar for Jeremy Leipzig
5.6 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

You don't really need HT-Seq to do this since there are no ranges involved

hairpins <- read.DNAStringSet(paste(hiseqDataDir,"refs/hairpin.fa",sep=""),use.names=TRUE)
names(hairpins)<-str_match(names(hairpins),"^\\S+") 
hairpinGR<-GRanges(names(hairpins),IRanges(1,width(hairpins)),strand="*")
bamView<-BamViews(bamPaths=bamPaths,bamSamples=bamSamples,bamRanges=hairpinGR)
bamcounts<-countBam(bamView)
ADD COMMENTlink written 5.6 years ago by Jeremy Leipzig18k

Thanks a lot. 

ADD REPLYlink written 5.4 years ago by Preethy Nair0

Hi, I have the same problem and I tried every step that you write. I also have an index of my bam sequence. However in the last one I have this error:

> bamcounts <- countBam(bamView)
Error in .BamViews_delegate("countBam", file, fun, ..., param = param) :
  'countBam' failed on ''
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  space 'hsa-let-7a-1' not in BAM header
2: RemoteWarning
  elements: 1
  UnspecifiedError: value[[3L]](cond)
'countBam' failed
  file: /home2/result/ann/mapped/SRR433866
  index: /home2/result/ann/mapped/SRR433866.bam 
ADD REPLYlink written 4.9 years ago by Kato50

usually the index would be called SRR433866.bam.bai. Show us

samtools view -H /home2/result/ann/mapped/SRR433866.bam

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Jeremy Leipzig18k

Thanks for the script

Vijay

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by vijay.bhaaskarla0
1
gravatar for Simon Anders
5.6 years ago by
Simon Anders30
Heidelberg, Germany
Simon Anders30 wrote:

I don't understand your GFF file: What to the coordinate intervals mean? Is this the position of the miRNA on the chromosome?

Also, note that htseq-count is meant to count reads mapped to the genome. You have mapped, however, to a reference which lists only the miRNA transcriptome and each miRNA as a separate sequence. Hence, you could just look at the 3rd column of your SAM file and count based on this.

ADD COMMENTlink written 5.6 years ago by Simon Anders30
0
gravatar for liran0921
5.6 years ago by
liran0921110
liran0921110 wrote:

I suppose your gtf file is downloaded from Mirbase. Actually the coordinates in this file is based on the alignment with bovine genome. So you should align your reads file with the same bovine genome if you want to use this gtf file for read counting.

ADD COMMENTlink written 5.6 years ago by liran0921110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1450 users visited in the last hour