Question

Alignment-free tools (like Kallisto or Salmon): useful for bacterial transcriptomes?

0

Entering edit mode

5.5 years ago

arctan • 0

I have a question about the use of alignment-free tools like Kallisto or Salmon for bacterial RNA-seq transcript quantification. Bacterial genomes are gene-dense and have a significant proportion of overlapping genes, but no splicing/introns.

What advantages / disadvantages (in terms of accuracy / biases) would alignment-free tools have for bacterial transcriptomes compared to more traditional alignment-based tools like Bowtie + featureCounts? Do they have problems with overlapping (convergent or divergent) genes?

Computation speed is not a major consideration because the bacterial genomes are small and don't have splicing.

Any insights would be appreciated.

RNA-Seq bacteria Salmon • 4.0k views

ADD COMMENT • link updated 5.5 years ago by Istvan Albert 100k • written 5.5 years ago by arctan • 0

0

Entering edit mode

I added Salmon as tag to attract its developer @Rob.

ADD REPLY • link 5.5 years ago by ATpoint 82k

score 2 · Answer 1 · 2018-10-25

2

Entering edit mode

5.5 years ago

Istvan Albert 100k

I will say that any tool that uses fewer steps is advantageous.

Note how in the first case you would need both an alignment and a feature count to quantify - each needs to be run correctly.

For example, are you accounting for the so-called size effect when using feature counts? Where shorter transcripts are more affected by the loss of coverage at their ends than longer transcripts. Both kallisto and salmon do account for that. Now what if you'd want to quantify against different strains where the transcripts are somewhat different? You could rest easy by knowing that kallisto can handle that, whereas the other methods will have trouble.

Using the right tool for the job is essential. Use alignments when that provides you with information that you could not get otherwise.

ADD COMMENT • link 5.5 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks for the reply! Regarding the size effect you mentioned, is that what the "effective length" measurements from kallisto or salmon are for? How bad can the accuracy get if I use featureCounts and don't account for that?

ADD REPLY • link 5.5 years ago by arctan • 0

0

Entering edit mode

effective size is typically defined as transcript length - read length, but it is applied as subtracting 1/2 read length on each side.

now as to the effect of this correction - as always it depends on what exactly is computed and compared. It may have no effect or it could completely change an outcome.

but it goes to the main point that I was making: when there are better techniques available one should use those, especially when doing so is simpler, faster and more efficient

ADD REPLY • link 5.5 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks. Since my goal is analyzing differential gene expression, I would pass the resulting read counts to downstream tools like DESeq2 or edgeR (or perhaps sleuth, which I need to look in to).

Accuracy, not speed (since the bacterial genomes are so small) is my main concern.

So the take-home message is that for bacteria RNA-seq (no splicing, but quite a number of overlapping genes) the alignment-free tools (e.g. Kallisto, Salmon) should be _as accurate_ or perhaps even more accurate (due to bias corrections and size effect corrections) than traditional alignment-based tools?

ADD REPLY • link 5.5 years ago by arctan • 0

1

Entering edit mode

I would expect them to be more accurate - but also have the limitations there is no ability to detect a new, unannotated feature.

I like to align to view the coverages, then evaluate and compare what I see to what I get from kallisto/salmon.

ADD REPLY • link 5.5 years ago by Istvan Albert 100k

0

Entering edit mode

Yes, I understand that it is dependent on the quality of the reference transcriptome annotation.

Thanks again for your answers and follow-ups. I will look into using these tools.

ADD REPLY • link 5.5 years ago by arctan • 0