Question

Best Workflow For Differential Expression Analysis Using De Novo Transcriptomes And Illumina Reads In 2014

0

Entering edit mode

10.1 years ago

Birdman ▴ 20

What is, in your opinion/experience, the best workflow for gene differential expression analysis of Illumina reads and de novo transcriptomes (e.g. generated by Trinity), without a reference genome?

Please suggest tools that are compatible with each other:

(pre-processing of reads) - Alignment tool - Read summarization - DE analysis

differential-expression workflow alignment • 5.1k views

ADD COMMENT • link updated 10.1 years ago by seidel 11k • written 10.1 years ago by Birdman ▴ 20

Ram · Answer 1 · 2014-03-25

A fairly straightforward approach is to simply use your de novo transcriptome to create an alignment index (e.g. a bowtie index), then use bowtie to align your reads to the index. The SAM or BAM output can then be parsed for summarization, i.e. you can count the number of reads mapping to each transcript for each sample to generate a count table. This could be done in perl or python or R (I don't know of an off the shelf solution), or whatever your favorite language is (the SAM format is easily parsed). Normally we think of the alignment results having chromosomal coordinates, but if the alignment index consists of your transcripts, then each read maps to a transrcipt name rather than a chromosome - so you just have to count these names. Once you have a count table, edgeR or DESeq are great options for quantifying differentially expressed transcripts, as mentioned by User000. This is fairly generic, and if you're simply looking to discover differentially expressed transcripts under a given set of conditions, this approach is fine (it won't be your limiting step). But depending on what you're trying to achieve some things could be tricky: such as whether you should allow any multimapping, in case your transcripts have a lot of redundant sequence. And you won't be taking advantage of reads that cross splice junctions, but you may not need to to simply find genes changing under condition X.

Ram · Answer 2 · 2014-03-25

0

Entering edit mode

10.1 years ago

User000 ▴ 690

use DESeq (http://genomebiology.com/2010/11/10/R106 - an original paper) an R package to test for differential expression. Although, there are also BaySeq and EdgeR

ADD COMMENT • link 10.1 years ago by User000 ▴ 690

0

Entering edit mode

I asked for a workflow. Which alignment tool and read summarization tool do you use before DESeq?

ADD REPLY • link 10.1 years ago by Birdman ▴ 20

0

Entering edit mode

Not sure about alignment tool, I guess you can use BWA. Then, extract count data from i.e. .sam files (You will need to produce a script to do that),eventually use i.e. DESeq to identify differentially expressed genes. Also, you may want to use ErmineJ to see the functional enrichment. I agree with cwarden45, it is quite tricky, I think, it is important to choose also the right assembly method.

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 10.1 years ago by User000 ▴ 690

score 0 · Answer 3 · 2014-03-25

0

Entering edit mode

10.1 years ago

Charles Warden 8.2k

I think differential expression using the assembled contigs is a bit tricky. For example, see this related response:

A: Trinity/RSEM/edgeR pipeline...now what?

ADD COMMENT • link 10.1 years ago by Charles Warden 8.2k