Question: RNA-seq mapper that allows variability and handles multi-mapping?
gravatar for Ekarl2
4.4 years ago by
Ekarl2100 wrote:

I have a species that has a 4-5% variation between alleles and each RNA-seq sample has several hundreds individuals. I am mapping against a template from another, single individual that has a much more complete transcriptome assembly.

Currently, I am using bowtie + RSEM, but the way bowtie works seems sub-optimal for this dataset as a lot of reads have a few mismatches in the seed compared with reference, so reads are not really mapping that well (65% or so) despite playing around with more lenient parameter choices in bowtie, such as decreasing seed length, increasing mismatches, allowing more backtracking etc. Mapping with e. g. BWA with default parameters restores a sizable amount of this decreased mapping, so the issue seems to be with bowtie.

Are there any alternatives to bowtie that can be combined with either RSEM (by e.g. removing gapped alignments?) or any other similar program that handles multi-mapping reads? If not, are there are alternatives that might work better for this kind of dataset? Like if we give up the ability to accurately handle multi-mapping reads, what other options are there that might be suitable for datasets like this one?

rna-seq mapping • 2.2k views
ADD COMMENTlink written 4.4 years ago by Ekarl2100

I'd recommend trying one of our tools (either sailfish or salmon). Both are very fast and accurate tools for transcript-level quantification. They both use a custom algorithmw for mapping reads to the transcriptome that is accurate and tolerant to errors / variation. Optionally, salmon can be paired with an aligner, but doesn't require removing gaps in the alignments prior to quantification.

ADD REPLYlink written 4.4 years ago by Rob4.1k

Have you looked at BBMap? GMAP is also supposed to be more SNP tolerant. While changing aligners may not necessarily give you the desired outcome since you are looking for options both would be worth giving a try.

ADD REPLYlink written 4.4 years ago by genomax87k

What do you mean 'by handles multi-mapping reads'? What is it you want to do with multi-map reads? Ignore them or count them? Neither option is ideal.

Multi-maps are a function of paralogy and sequence accuracy. The higher the paralogy or the lower the accuracy the more multi-mapping reads you'll get. By increasing the leniency you're increasing the chances of, probably artifactual, multi-maps.

You mention BWA is better, so why not just use that?

ADD REPLYlink written 4.4 years ago by Chris Cole760

Given that @Ekarl2 is using Bowtie in conjunction with RSEM, I assume that "handles multi-mapping reads" is used to mean "resolves multi-mapping reads" (i.e. fractionally allocating multi-mapping reads in the manner that maximizes the likelihood of the observed reads — at least locally).

ADD REPLYlink written 4.4 years ago by Rob4.1k

You might find to be useful. It is focused precisely on the issue of how to accurately assign multi-mapping reads for RNA-Seq abundance estimation.

ADD REPLYlink written 4.3 years ago by Lior Pachter530
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 948 users visited in the last hour