Question: Why is eXpress producing strangely low counts?
gravatar for bagi.m
9 days ago by
bagi.m10 wrote:

I have aligned our RNA-seq to reference transcriptome (there is no genome) using bwa mem. The reference transcriptome is from the same species but from different population, so some mismatches and indels are expected.

Then I used eXpress to count reads.

The results are strangely low. No reference contig has more than 60 tot_counts. However, when I look at the alignment in tablet or IGV some contigs have up to 500000 reads aligned to them. Moreover, uniq_counts is equal to tot_counts for all contigs. But cca 0.1% of reads in my bam file are multimapped.

Any suggestions what could be wrong?

rna-seq • 98 views
ADD COMMENTlink modified 8 days ago • written 9 days ago by bagi.m10
gravatar for bagi.m
8 days ago by
bagi.m10 wrote:

Ok, this one is embarrassing. I completely misunderstood the eXpress documentation regarding sorting the input .bam file and the fact that you are _not_ supposed to sort it.

ADD COMMENTlink written 8 days ago by bagi.m10
gravatar for Istvan Albert
9 days ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Use kallisto instead. The creators of eXpress recommend as a better alternative.

As for what is going on - well I will say that it is difficult to track down not know what the data looks like - and since there are much better alternatives it is not quite worth doing so

For organisms that have significant splicing going on a lot many more of your reads should be multi-mapping in that case something is off. If your organism does not have splicing then there is no reason to use eXpress.

ADD COMMENTlink modified 9 days ago • written 9 days ago by Istvan Albert ♦♦ 79k

Thank you for your answer. I am indeed planing to try Salmon with this data, and see if I get to similar conclusion. I found some articles / posts claiming pseudoalignment tools are better, but both were based on human data. So I don't know if this claim holds for organisms with much higher intraspecies sequence variability (where tweaking alignment parameters is sometimes necessary).

ADD REPLYlink written 8 days ago by bagi.m10

Pay attention because Salmon- like eXpress - doesn't want sorted input.

ADD REPLYlink written 8 days ago by h.mon23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 648 users visited in the last hour