Mapping miRNA to the Human Genome with Bowtie2
2
1
Entering edit mode
6.2 years ago
gkuffel22 ▴ 100

Hopefully someone can help. I have read conflicting info about aligners for miRNA data but I have finally decided on Bowtie2. I have 2 questions:

1. I am aligning the miRNA reads to the human genome and then using HTSeq-count with the gff file from miRBase to count the miRNAs aligned in the BAM file, is this a reasonable approach?

2. Does anyone know if HTSeq will accurately count miRNAs that map to multiple locations in the genome?

Thanks everyone!

alignment miRNA Bowtie2 Python HtSeq-count • 5.4k views
0
Entering edit mode

You should use bowtie v.1 instead since miRNA are going to be short and you want to keep your alignments ungapped.

0
Entering edit mode

Im doing almost the same! Does It works for you? I use Bowtie2 and I get a 72% of overall alignment rate, I think that is fine. I use;

bowtie2 -p 8 -x $index --np 0 --n-ceil L,0,0.02 --rdg 0,6 --rfg 0,6 --mp 6,2 --score-min L,0,-0.18 -q$reads.fastq -S \$result.sam

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER should only be used to provide new answers for the original question.

1
Entering edit mode
6.1 years ago
apa@stowers ▴ 580

Bowtie2 should work as well as Bowtie1, since bowtie2 should not return a gapped alignments if a contiguous one exists. Generally speaking. However, some people prefer Bowtie1 since you can guarantee "--best --strata", which Bowtie2 does not do.

There are better aligners for miRNA data, though, which also include analysis of the alignment context to give a more accurate picture of expression: see MirDeep2 and ShortStack (GitHub link).

HTSeq-count will quantitate multireads if you force it to. It depends on two things: one, mapping quality of multireads (which is usually very low or zero), and two, any SAM tags indicating a multiread (which is only the NH tag, according to the documentation).

HOWEVER: only Tophat2 uses the NH tag -- Bowtie1/2 do not -- Bowtie2 uses the XS tag; not sure about Bowtie1.

But in any case it is only a SAM tag, which can be stripped. For instance, to blind htseq-count to multireads from a Tophat bam file, set "-a 0" to stop filtering on mapping quality, and strip the NH tag in-line:

samtools view in.bam | perl -pe 's/\s+NH:i:\d+//' | htseq-count -s no -a 0 -m intersection-nonempty - genes.gtf > counts.txt
`

If htseq-count ever uses other tags, then just strip those too.

0
Entering edit mode
6.2 years ago

With regard to your second question, HTSeq-count will be default ignore/discard multimapping reads, see for example the FAQ for an explanation about that.