Question

Forum:Designing RNA-seq data analysis pipeline

0

Entering edit mode

6.5 years ago

Arindam Ghosh ▴ 510

I am trying to design a pipeline to study the differential gene expression in human embryonic stem cells during its various stages of development and with adult cells. The primary source of data will be NCBI-SRA.

A basic workflow that I have in mind I based on the RNA-seq tutorial from Griffith Lab (https://github.com/griffithlab/rnaseq_tutorial/wiki).

I need a few help in finalising the work.

What are the primary things to be considered while downloading my data?
Do all experiments use ERCC spike-in for quality control?
For alignment of reads of Reference Genome is a simple HP desktop with 8gb RAM sufficient? Or do I need to upgrade?

RNA-seq sra ngs • 1.6k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.5 years ago by Arindam Ghosh ▴ 510

score 1 · Answer 1 · 2017-11-14

1

Entering edit mode

6.5 years ago

Devon Ryan 104k

For your purposes I would think potential batch effects would be the most important thing to consider.
No, most don't, they're rarely useful.
Given your limited computational resources, you're going to want to stick to salmon or kallisto. This won't allow you to find new transcripts, but I doubt you care about that anyway. If you can get access to a computer with more cores then you'll be better off.

ADD COMMENT • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

What is batch effect? I am planning to take into account data at different stages of cell growth.
Kallisto seems to be good. As per the tutorial HISAT2 was being used for alignment.

+4: Do the entire protocol require high computational power or only during alignment.

ADD REPLY • link 6.5 years ago by Arindam Ghosh ▴ 510

2

Entering edit mode

Google the term.
You might have enough memory for hisat2, but I can't guarantee it.

\4. It's mostly alignment that needs a lot of memory and cores.

ADD REPLY • link 6.5 years ago by Devon Ryan 104k