Question: Mapping miRNA to the Human Genome with Bowtie2
gravatar for gkuffel22
3.0 years ago by
United States
gkuffel2270 wrote:

Hopefully someone can help. I have read conflicting info about aligners for miRNA data but I have finally decided on Bowtie2. I have 2 questions:

  1. I am aligning the miRNA reads to the human genome and then using HTSeq-count with the gff file from miRBase to count the miRNAs aligned in the BAM file, is this a reasonable approach?

  2. Does anyone know if HTSeq will accurately count miRNAs that map to multiple locations in the genome?

Thanks everyone!

ADD COMMENTlink modified 2.8 years ago by apa@stowers450 • written 3.0 years ago by gkuffel2270

You should use bowtie v.1 instead since miRNA are going to be short and you want to keep your alignments ungapped.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax74k

I`m doing almost the same! Does It works for you? I use Bowtie2 and I get a 72% of overall alignment rate, I think that is fine. I use;

bowtie2 -p 8 -x $index --np 0 --n-ceil L,0,0.02 --rdg 0,6 --rfg 0,6 --mp 6,2 --score-min L,0,-0.18 -q $reads.fastq -S $result.sam
ADD REPLYlink written 2.9 years ago by Buffo1.7k

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER should only be used to provide new answers for the original question.

ADD REPLYlink written 2.9 years ago by genomax74k
gravatar for apa@stowers
2.8 years ago by
Kansas City
apa@stowers450 wrote:

Bowtie2 should work as well as Bowtie1, since bowtie2 should not return a gapped alignments if a contiguous one exists. Generally speaking. However, some people prefer Bowtie1 since you can guarantee "--best --strata", which Bowtie2 does not do.

There are better aligners for miRNA data, though, which also include analysis of the alignment context to give a more accurate picture of expression: see MirDeep2 and ShortStack (GitHub link).

HTSeq-count will quantitate multireads if you force it to. It depends on two things: one, mapping quality of multireads (which is usually very low or zero), and two, any SAM tags indicating a multiread (which is only the NH tag, according to the documentation).

HOWEVER: only Tophat2 uses the NH tag -- Bowtie1/2 do not -- Bowtie2 uses the XS tag; not sure about Bowtie1.

But in any case it is only a SAM tag, which can be stripped. For instance, to blind htseq-count to multireads from a Tophat bam file, set "-a 0" to stop filtering on mapping quality, and strip the NH tag in-line:

samtools view in.bam | perl -pe 's/\s+NH:i:\d+//' | htseq-count -s no -a 0 -m intersection-nonempty - genes.gtf > counts.txt

If htseq-count ever uses other tags, then just strip those too.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by apa@stowers450
gravatar for WouterDeCoster
3.0 years ago by
WouterDeCoster42k wrote:

With regard to your second question, HTSeq-count will be default ignore/discard multimapping reads, see for example the FAQ for an explanation about that.

ADD COMMENTlink written 3.0 years ago by WouterDeCoster42k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 873 users visited in the last hour