Question

Very basic for a student: How to analyse RPKM expression data to identify genes with changed expression

0

Entering edit mode

8.0 years ago

callumalmost • 0

I'm a student doing a data analysis project and have been given a dataset containing the mean reads per kilobase per million mapped reads (RPKM) for 6 mRNA samples that have undergone high throughput sequencing, for about 11000 genes. The samples have been split into 2 categories to compare so I'm assuming the RPKM have been averaged or something as they're is just 2 columns of RPKM values, I column for each category and rows relate to the gene in question. How do i find which genes have changed expression, I assume I have to use a program like r? I have some previous experience using r, but using rcmdr package and I don't know if I have to use a different package here? Any help is much appreciated and Thank you in advance :)

RNA-Seq r analysis RPKM expression • 3.0k views

ADD COMMENT • link updated 8.0 years ago by Michael 54k • written 8.0 years ago by callumalmost • 0

2

Entering edit mode

It's rather bad that the RPKM's are averaged as you describe it, you should have individual measurements of each sample to estimate dispersion and biological/technical variability unrelated to your trait/treatment of interest.

ADD REPLY • link 8.0 years ago by WouterDeCoster 47k

0

Entering edit mode

yes all i have been given is a table with 3 columns, 1st is the refseq for the gene in question and the next two colums are the rpkm for the 2 categories (viable and non viable) i havent been told that they were averaged but since i dont have 6 different rpkm values for each gene i'm assuming they have been

ADD REPLY • link 8.0 years ago by callumalmost • 0

1

Entering edit mode

I think details here are not so important. You have 11000 samples of hypothesis testing problem. Using simple tests like t-test can be useful! I emphasis that I agree with @decosterwouter that you must not compare just average values and should use variation in the samples.

There exists advanced methods for computing differentially expressed gens(DEG) most of them are developed for microarray data. But you can find some methods for finding DEGs in rna-seq data like DEGseq.

ADD REPLY • link 8.0 years ago by Vasei ▴ 30

0

Entering edit mode

ok how does DEGseq work? is it just a package in r that will compute expressional difference from the rpkm values? and how do you mean using t tests can be helpful?

ADD REPLY • link 8.0 years ago by callumalmost • 0

1

Entering edit mode

I strongly encourage you to follow MIchael's advise below. This analysis is a complete waste of everyone's time.

ADD REPLY • link 8.0 years ago by Devon Ryan 104k

score 2 · Answer 1 · 2016-04-17

A sensible DE analysis cannot be done based on the data you have been provided with. First of all, the information about replication has been lost, second the data is given in a unit that is not suitable for DE analysis. The only sensible answer is, ask for the raw count data. If these are not available, ask for the raw bam files and do the counting yourself. Regarding a student project, such project should teach students to do things the right way, not the wrong way, and imho, this is not your fault. I suggest that you point your teachers at this post in case there is any doubt.