Question

Moving from arrays to RNA-seq - should I run multiple pipelines?

0

Entering edit mode

5.7 years ago

Nonacon ▴ 10

I am moving from microarrays to RNA-seq for a gene expression study in mice. I am a molecular biologist by training and don't have much in the way of programming skills. I know there are several tools, platforms, and pipelines available but there doesn't seem to be any standardization so it's a bit confusing to a newbie like me. I also read somewhere that its a good idea to run multiple pipelines to get accurate results. Is that recommended? Here are some of the pipelines I'm considering:

assembly based - Trinity
k-mer based - Salmon/Kallisto
alignment based - STAR and bowtie2, followed by DEseq2

Any guidance or thoughts is greatly appreciated. Thanks!

RNA-Seq • 925 views

ADD COMMENT • link updated 5.7 years ago by h.mon 35k • written 5.7 years ago by Nonacon ▴ 10

score 4 · Answer 1 · 2018-08-31

4

Entering edit mode

5.7 years ago

h.mon 35k

If you run multiple pipelines, you will get multiple partially overlapping, partially different results. I think it is better to decide a priory on one pipeline, study and understand this pipeline as much as possible, and stick to it. Testing many pipelines and comparing them can (even involuntarily) lead to fishing expeditions, where one chooses the pipeline with the "most interesting" results.

For mouse, there is no point in running Trinity or other de novo transcriptome assembler, as the genome assembly and annotation is very good. Transcriptome-based Salmon or kallisto quantification is really fast, so it would be my preferred choice, but you still have to decide which software to use for differential expression, and whether perform differential gene expression or differential transcript expression (differential transcript expression requires more reads per sample than differential gene expression).

You should not use Bowtie2 to map RNAseq to the reference genome - it can be used to map to the reference transcriptome, though, and then one can use RSEM or Salmon for quantification.

ADD COMMENT • link 5.7 years ago by h.mon 35k

1

Entering edit mode

If you run multiple pipelines, you will get multiple partially overlapping, partially different results. I think it is better to decide a priory on one pipeline, study and understand this pipeline as much as possible, and stick to it. Testing many pipelines and comparing them can (even involuntarily) lead to fishing expeditions, where one chooses the pipeline with the "most interesting" results.

Great point. I am planning to combine the results from multiple-pipelines rather than "choosing the interesting results" from a specific pipeline

For mouse, there is no point in running Trinity or other de novo transcriptome assembler, as the genome assembly and annotation is very good. Transcriptome-based Salmon or kallisto quantification is really fast, so it would be my preferred choice, but you still have to decide which software to use for differential expression, and whether perform differential gene expression or differential transcript expression (differential transcript expression requires more reads per sample than differential gene expression).

I understand the mouse genome is well annotated but some papers highlighted the importance of assembly approaches even when a reference genome is available. This would definitely give a diverse set of results and I am happy to explore the differences between pipelines. I am under the impression though that a multi-pipeline approach would give me a wider net to start with.

You should not use Bowtie2 to map RNAseq to the reference genome - it can be used to map to the reference transcriptome, though, and then one can use RSEM or Salmon for quantification.

Agree. I would prefer using STAR with a reference genome.

ADD REPLY • link 5.6 years ago by Nonacon ▴ 10