Question: Why is eXpress producing strangely low counts?
0
gravatar for bagi.m
4 months ago by
bagi.m10
bagi.m10 wrote:

I have aligned our RNA-seq to reference transcriptome (there is no genome) using bwa mem. The reference transcriptome is from the same species but from different population, so some mismatches and indels are expected.

Then I used eXpress to count reads.

The results are strangely low. No reference contig has more than 60 tot_counts. However, when I look at the alignment in tablet or IGV some contigs have up to 500000 reads aligned to them. Moreover, uniq_counts is equal to tot_counts for all contigs. But cca 0.1% of reads in my bam file are multimapped.

Any suggestions what could be wrong?

rna-seq • 199 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by bagi.m10
2
gravatar for bagi.m
4 months ago by
bagi.m10
bagi.m10 wrote:

Ok, this one is embarrassing. I completely misunderstood the eXpress documentation regarding sorting the input .bam file and the fact that you are _not_ supposed to sort it.

ADD COMMENTlink written 4 months ago by bagi.m10
1
gravatar for Istvan Albert
4 months ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

Use kallisto instead. The creators of eXpress recommend as a better alternative.

As for what is going on - well I will say that it is difficult to track down not know what the data looks like - and since there are much better alternatives it is not quite worth doing so

For organisms that have significant splicing going on a lot many more of your reads should be multi-mapping in that case something is off. If your organism does not have splicing then there is no reason to use eXpress.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Istvan Albert ♦♦ 80k

Thank you for your answer. I am indeed planing to try Salmon with this data, and see if I get to similar conclusion. I found some articles / posts claiming pseudoalignment tools are better, but both were based on human data. So I don't know if this claim holds for organisms with much higher intraspecies sequence variability (where tweaking alignment parameters is sometimes necessary).

ADD REPLYlink written 4 months ago by bagi.m10
1

Pay attention because Salmon- like eXpress - doesn't want sorted input.

ADD REPLYlink written 4 months ago by h.mon25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 843 users visited in the last hour