De Genes And Isoforms?
1
1
Entering edit mode
11.0 years ago
lzsph ▴ 70

Hey guys,

We de novo assembly our transcriptome using Trinity, then aligned and quantified using RSEM's Perl script run_RSEM_align_n_estimate.pl supplied with Trinity. This resulted two results, say RSEM.isoforms.results and RSEM.genes.results. We performed differential gene expression analysis using edgeR supplied with Trinity, using the gene-based RSEM.genes.results rather than isoforms. Then a lot of DE genes were generated according to the instructions of Trinity here.

So, the gene-based DE genes contained a lot of isoforms. (such as tables in this page of Trinity)

MY QUESTION is: When we got these DE genes and want to find which GO categories or KEGG pathways were involved in these DE genes. Should we extract all isoforms of the DE genes then feed them to Blast2GO? Or any other approaches?

Any suggestions and criticism will be appreciated. Thanks!

differential-expression rna-seq ngs • 5.9k views
ADD COMMENT
0
Entering edit mode
11.0 years ago
Assa Yeroslaviz ★ 1.8k

You can try to do it in DAVID. There you can upload your list of genes and check for enriched GO categories. I would try both options, though I guess most of the isoforms will have the same GO group affiliations. After uploading the right annotation format, you can choose the GO annotations and run the test. Under functional annotation chart DAVID lists the enriched GO categories according to your thresholds. This list ( with its p-value) you can than upload into REVIGO and visualize the connectivity of your GO categories.

Assa

P.S. IMHO, BLAST2GO is a good alternative, but DAVID is much faster.

ADD COMMENT
0
Entering edit mode

Thank you Assa!

DAVID does much faster than Blast2GO and REVIGO's visualization looks great. I'll give it a try.

My DE list tagged with interior tags such as comp11_c0 that supplied with Trinity rather than gene IDs/accessions. I should prepare a gene list with accessions and then upload it to DAVID. Since the DE genes contained some (maybe novel) genes that are not annotated, we should discard then when preparing such a gene list, right?

Thank you!

Regards,

S.H.

ADD REPLY
0
Entering edit mode

If this is mammalian would be careful using DAVID with RNA Seq data unless you have very deeply sequenced samples.

As I understand it, DAVID's statistical model assumes all genes in the underlying (background) gene set have an equal likelihood of being called differentially expressed. This is not true if some genes are measured with low read counts (below 10) because they have a much higher technical variance due to the counting noise. So you may be able to call, e.g. most 2X fold changes in your highly expressed genes but only a 5X fold change in you low count genes. Because expression level is correlated with function this can make highly expressed genes look falsely enriched.

GoSeq, I believe, is supposed to address some of these issues. I have not used it, though, so I can't vouch for its accuracy.

EdgeR will also introduces some systematic biases into your results set in that will smooth all variances towards the mean. This can mean that it will systematically over-calls high variance genes and under-calls low variance genes. This isn't a mistake. It does this to improve the variance calculations, and it does increase power when there are low replicates.

While I doubt it will give false results, I prefer to use a t-test when there will be a downstream analysis done because it produces less biased results and I have found that it has pretty good power when there are 3 or more replicates, though it probably has too little power if you only have two replicates.

ADD REPLY

Login before adding your answer.

Traffic: 2292 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6