Differential Expression Analysis clarifications
2
2
Entering edit mode
10.0 years ago
Robyn Edgar ▴ 20

Hi!

I am working on a comparative transcriptomics project and have a few logistics questions.

I have 4 transcriptomes (paired end Illumina) that I have been working with since last summer. I completed a combined reference assembly of these four transcriptomes (using ABySS, CD-Hit, Scaffolding and a few other steps). I then generated read counts using RSEM and then completed differential expression analysis using Bioconductor's DESeq package.

A few weeks ago, my lab recieved the 4kb draft genome for the species that I am working with. It's not the finalized version, and there is still work being done to obtain a better quality genome. My main question is, should I be re-running my DE analysis using the draft genome versus the combined transcriptomes? I know that this is typically how DE analysis is done when you have a reference genome available. My concern is that since it is still only at the draft genome stage, the combined reference transcriptomes might be more accurate to use still.

My second question is: Is there a way for me to determine which method is more accurate in this case?

My initial thought was to align the combined transcriptomes to the draft genome to get an idea of the coverage. If the coverage was relatively high, then I would continue to use the DE analysis that I have already completed. Does this make sense to do? I have the output from GMAP and have been viewing it in IGV, but I'm not sure how to quantify the results of the alignment.

The last thing that I have a question about is annotation-related. My DE analysis was completed on the transcripts, not on predicted genes. My problem is that I'm not sure how to annotate my results from the DE analysis without using something like Blast2Go (which is soooo slow). Any suggestions would be greatly appreciated!

Sorry for the abundance of questions!

Thanks for the help!

RNA-Seq Differential-Expression • 2.9k views
ADD COMMENT
1
Entering edit mode
10.0 years ago

There are kmer-based strategies (NIKS, RUFUS, etc.) for differential expression without a reference assembly. I think the BLAST annotations will be harder than you might imagine (becuase often not a 1:1 match between assemblies).

This message tread (and the link provided in my comment) might be helpful:

A: Trinity/Rsem/Edger Pipeline...Now What?

ADD COMMENT
0
Entering edit mode

Thanks for the links!

ADD REPLY
0
Entering edit mode
10.0 years ago

These are difficult questions to answer because they lack specificity.

In general you need to recognize that the even evaluation metric that you are looking for is undefined. It is difficult to quantify what "better" means when you are using data that are also incomplete. For example is it better to find two genes where each is 60% likely to exists or one gene that is 90% likely to express.

In these cases one is left with estimates and I recommend not to sweat it too much and forge ahead with the choice that yields answers quicker. The most important part of any analysis is to get to a point of actionable information. With that you learn a lot about the data and the process.

Finally as I mentioned this site works the best when a post contains just one specific well defined and short question. Longer and multiple questions make it very difficult to contribute.

ADD COMMENT
0
Entering edit mode

Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6