Question

Differential Expression Analysis clarifications

2

Entering edit mode

10.0 years ago

Robyn Edgar ▴ 20

Hi!

I am working on a comparative transcriptomics project and have a few logistics questions.

I have 4 transcriptomes (paired end Illumina) that I have been working with since last summer. I completed a combined reference assembly of these four transcriptomes (using ABySS, CD-Hit, Scaffolding and a few other steps). I then generated read counts using RSEM and then completed differential expression analysis using Bioconductor's DESeq package.

A few weeks ago, my lab recieved the 4kb draft genome for the species that I am working with. It's not the finalized version, and there is still work being done to obtain a better quality genome. My main question is, should I be re-running my DE analysis using the draft genome versus the combined transcriptomes? I know that this is typically how DE analysis is done when you have a reference genome available. My concern is that since it is still only at the draft genome stage, the combined reference transcriptomes might be more accurate to use still.

My second question is: Is there a way for me to determine which method is more accurate in this case?

My initial thought was to align the combined transcriptomes to the draft genome to get an idea of the coverage. If the coverage was relatively high, then I would continue to use the DE analysis that I have already completed. Does this make sense to do? I have the output from GMAP and have been viewing it in IGV, but I'm not sure how to quantify the results of the alignment.

The last thing that I have a question about is annotation-related. My DE analysis was completed on the transcripts, not on predicted genes. My problem is that I'm not sure how to annotate my results from the DE analysis without using something like Blast2Go (which is soooo slow). Any suggestions would be greatly appreciated!

Sorry for the abundance of questions!

Thanks for the help!

RNA-Seq Differential-Expression • 2.9k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by Robyn Edgar ▴ 20

Ram · Answer 1 · 2014-04-17

1

Entering edit mode

10.0 years ago

Charles Warden 8.2k

There are kmer-based strategies (NIKS, RUFUS, etc.) for differential expression without a reference assembly. I think the BLAST annotations will be harder than you might imagine (becuase often not a 1:1 match between assemblies).

This message tread (and the link provided in my comment) might be helpful:

A: Trinity/Rsem/Edger Pipeline...Now What?

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Charles Warden 8.2k

0

Entering edit mode

Thanks for the links!

ADD REPLY • link 10.0 years ago by Robyn Edgar ▴ 20

Ram · Answer 2 · 2014-04-17

These are difficult questions to answer because they lack specificity.

In general you need to recognize that the even evaluation metric that you are looking for is undefined. It is difficult to quantify what "better" means when you are using data that are also incomplete. For example is it better to find two genes where each is 60% likely to exists or one gene that is 90% likely to express.

In these cases one is left with estimates and I recommend not to sweat it too much and forge ahead with the choice that yields answers quicker. The most important part of any analysis is to get to a point of actionable information. With that you learn a lot about the data and the process.

Finally as I mentioned this site works the best when a post contains just one specific well defined and short question. Longer and multiple questions make it very difficult to contribute.