Forum:Designing RNA-seq data analysis pipeline
1
0
Entering edit mode
6.5 years ago
Arindam Ghosh ▴ 510

I am trying to design a pipeline to study the differential gene expression in human embryonic stem cells during its various stages of development and with adult cells. The primary source of data will be NCBI-SRA.

A basic workflow that I have in mind I based on the RNA-seq tutorial from Griffith Lab (https://github.com/griffithlab/rnaseq_tutorial/wiki).

I need a few help in finalising the work.

  • What are the primary things to be considered while downloading my data?
  • Do all experiments use ERCC spike-in for quality control?
  • For alignment of reads of Reference Genome is a simple HP desktop with 8gb RAM sufficient? Or do I need to upgrade?
RNA-seq sra ngs • 1.6k views
ADD COMMENT
1
Entering edit mode
6.5 years ago
  1. For your purposes I would think potential batch effects would be the most important thing to consider.
  2. No, most don't, they're rarely useful.
  3. Given your limited computational resources, you're going to want to stick to salmon or kallisto. This won't allow you to find new transcripts, but I doubt you care about that anyway. If you can get access to a computer with more cores then you'll be better off.
ADD COMMENT
0
Entering edit mode
  1. What is batch effect? I am planning to take into account data at different stages of cell growth.
  2. Kallisto seems to be good. As per the tutorial HISAT2 was being used for alignment.

+4: Do the entire protocol require high computational power or only during alignment.

ADD REPLY
2
Entering edit mode
  1. Google the term.
  2. You might have enough memory for hisat2, but I can't guarantee it.

\4. It's mostly alignment that needs a lot of memory and cores.

ADD REPLY

Login before adding your answer.

Traffic: 3107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6