How To Calculate Differential Gene Expression In Rnaseq Experiments?
2
5
Entering edit mode
9.9 years ago
Nebo ▴ 80

I've worked with tolerance to drought in sugarcane in Brazil and recently I came to USA to perform some RNA-seq at the University of Illinois. I got 4 cDNAs libraries from two contrasting genotypes, one very tolerant and the other very sensitive, in two conditions each: irrigated control and severe drought. We had paired-end sequencing for the 4 RNAseq libraries, using Illima. The yield of these libraries was high at over 13 million reads per library. The average quality scores of all bases in the first and second read were above 20, the error rate of this run was <0.8%.

I want to find differentially expressed genes between the contrasting genotypes, so I've used novoalign to do the alignment against the sorghum genome (gene models) and now I'm trying to find a way to normalize my data and calculate the differentially expressed genes between the two genotypes and two conditions. I've found some papers but most of them tells about doing de novo assembly and after that calculating the expression, but I'm not interested in assemblying, only on the differences of gene expression between genotypes.

I'd like to know if there is any formula or way to normalize and calculate the gene expression ... Thanks

rna gene • 12k views
ADD COMMENT
9
Entering edit mode
9.9 years ago
brentp 23k

Unless you want it to be the focus of your research, rely on existing libraries to do this.

Once you get counts by gene (you can do this with HT-Seq), you can use DESeq. I believe that for contrasting genotypes, you can use the conditions as biological replicates and for contrasting conditions you can use the genotypes as biological replicates. (This will give you conservative estimates of the differences.) Then send to DESeq R-package and follow this. The DESeq paper is here.

You can also use cufflinks after adding the XS flag if Novoalign doesn't add it. You can use the command in that link if you're using single end. Otherwise, you'll need to use bitwise and() (also in awk) to make sure you add the +/- info correctly. Then follow the example in the tutorial. The tophat paper is here (and supplemental info for the statistical details).

Both of these will do the normalization for you. Cufflinks will also find differences in transcript use.

ADD COMMENT
0
Entering edit mode

Thanks for the answer.. I was gonna use tophat and Cufflinks for DE and also alternative splincing, but for that the alignment has to be done using the whole genome and not gene models as I did. The problem is that sugarcane genome is not sequenced yet and although Sorghum is close, when I aligned using the sorghum genome (not gene models) as reference, the percent of alingment was low. Since alternative splicing is not the main focus of my work, I decided to be focus on the gene expression only. I'll check the Ht-seq and DEseq, I think they will work... thank you very much for your help

ADD REPLY
0
Entering edit mode

does it require assembly as well?

ADD REPLY
0
Entering edit mode

Not sure what you are asking. Cufflinks requires BAM or SAM with the XS flag and DESeq requires a file with counts of read per gene.

ADD REPLY
0
Entering edit mode

I mean, since cufflinks recognizes alternative splicing, should I perform the transcriptome assembly before the gene expression anaylisis...? in cufflinks paper they say they did the assembly before the expression analysis...

Also, my lab got a free trial for the CLCbio genomics workbench.. It says it can replace all of these other softwares... are you familiar with that?

ADD REPLY
5
Entering edit mode
9.9 years ago
Benm ▴ 710

There are many approaches to do that, and most of them follow steps of below:

(ab Initio whole transcriptome assembly if without genome information =>) mapping => isoforms expression level (alternative splicing events) => normalization (TPM, RPKM, FPKM) => statistical models (hypothesis test, assign significance to differential expression) => further algorithm (ICA, SOM, ANOVA etc.)

If you want to find out more solution you can follow RNA-seq blog: http://rna-seqblog.com

If you wan to find out the software, there is a great review for RNA-seq analysis, including mapping, transcriptome reconstruction, and differential gene expression analysis: Garber M, Grabherr MG, Guttman M, Trapnell C. (2011) Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8(6), 469-77.

ADD COMMENT
0
Entering edit mode

thanks for your help, for the blog and the paper, I'm checking them right now....one more doubt, why is the assembly step needed?

ADD REPLY
0
Entering edit mode

Because alternative splicing events will cause 'read assignment uncertainty', if you don't have the isoforms information or reference genome, you can't figure out the reads generated by RNA-seq will contribute to which genes or transcripts expression, so it affects expression quantification accuracy. Even though you have sorghum genome, transcriptome assembly is still needed, because the software will reports all isoforms/transcripts (Oases, TransABySS), this information will guide you estimate transcripts expression level.

ADD REPLY
0
Entering edit mode

Thanks again, but like I said in the question, what if I'm not interested in alternative splicing? do I still need to perform the assembly? cause my main goal is to find the differentially expressed genes between the tolerant and sensitive genotypes, then select some of them, and find candidate genes for functional analyses and breeding...

ADD REPLY
0
Entering edit mode

Thanks again, but what if I'm not interested in alternative splicing, I mean, this is not my main focus? do I still need to perform the assembly? cause my main goal is to find the differentially expressed genes between the tolerant and sensitive genotypes, then select some of them, and find candidate genes for functional analyses and breeding

ADD REPLY
0
Entering edit mode

Thanks again, but what if I'm not interested in alternative splicing, I mean, this is not my main focus.. do I still need to perform the assembly? cause my main goal is to find the differentially expressed genes between the tolerant and sensitive genotypes, then select some of them, and find candidate genes for functional analyses and breeding

ADD REPLY
0
Entering edit mode

More simple expression: gene in genome just one status, no splicing, right? But actually, the splicing appears frequency is over 40% in plant genome, so you will missed or error calculation for gene expression if you don't know all the isoforms of each gene. Isoforms for genome can be seen as two main types of 'structural variation': Deletion and Insertion. I recommend you should do some assembly work before your differential expression analysis, it is not too complicate as you imagine for your RNA-seq. The two software I mentioned above are also convenient tools.

ADD REPLY
0
Entering edit mode

thank you, I'll follow your suggestions...I was discussing with more researchers here, and since sugarcane genotypes are hybrids from 2 species, it is indeed better to perform transcriptome assembly due to the differences between the genotypes and the reference. Regarding the alternative splicing, I think I should use the whole sorghum genome instead of gene models right?

ADD REPLY
0
Entering edit mode

should I perform assembly of both genotypes together? or even of both genotypes and the two conditions together?

ADD REPLY

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6