Question: What would be the best way to calculate fold changes and p-values of this RNA-seq data?
gravatar for mmccarthy781
2.8 years ago by
mmccarthy78110 wrote:

Hey all,

This is my first time posting, so I hope this question isn't too open ended; sorry if it's a bit long. Anyways, I'm a current bioinformatics masters student, and I've just joined up with a cancer research lab as an intern. They have RNA-seq data that they received from outsourcing their sequencing, and from it, they'd like me to get them a list of the most significant differentially expressed genes by fold changes and p-values.

The problem is that they don't have the raw data. The facility they outsourced their sequencing too did some of the data analysis for them, so what I have to work with is a data frame for each trial, the control and three separate tests. Each data frame contains the gene ID, the transcript ID(s), the length, the expected count, and the FPKM.

This type of analysis is new to me, and in reading how to complete this task using tools such as edgeR, it seems as though it's important to have the raw read counts, which unfortunately I don't have, and don't think I can get. I don't believe that the expected_count is the same thing is it? They do supply an equation for the FPKM as FPKM = (10^6 * C) / )N * L / 10^3); where C is the number of fragments uniquely aligned to the gene, N is the total number of fragments that are uniquely aligned to all genes, and L is the number of bases on the gene. Would C in this equation be equal to the raw read count? It appears to be approximately equal to the expected read count.

Any ideas on how to solve this problem are much appreciated!

rna-seq • 1.1k views
ADD COMMENTlink modified 2.8 years ago by Devon Ryan94k • written 2.8 years ago by mmccarthy78110

Don't waste your time with this. Your group payed for the sequencing, whoever did it will happily give you the fastq files.

ADD REPLYlink written 2.8 years ago by Devon Ryan94k

Yeah I'm going to attempt to get the fastq files. I was just wondering if there was any use to what I have currently. I believe that it's RSEM output.

ADD REPLYlink written 2.8 years ago by mmccarthy78110


In all seriousness, do you know how did they produced the expected counts? Because if you do, you might be able to use tximport to produce counts and afterwards use edgeR :). HOWEVER! Take into account that for you to publish it is likely you will be asked to upload the raw data to a publicly available website.

ADD REPLYlink written 2.8 years ago by biofalconch440

So looking into the problem a bit more, it looks like this is the direct output from RSEM. With the expected counts being:

"'expected_count' is the sum of the posterior probability of each read comes from this transcript over all reads. Because 1) each read aligning to this transcript has a probability of being generated from background noise; 2) RSEM may filter some alignable low quality reads, the sum of expected counts for all transcript are generally less than the total number of reads aligned."

ADD REPLYlink written 2.8 years ago by mmccarthy78110
gravatar for Devon Ryan
2.8 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

If it's really the output of RSEM, then you can use limma/voom on it. But honestly I wouldn't trust a company, I've seen them do completely absurd things with an analysis.

ADD COMMENTlink written 2.8 years ago by Devon Ryan94k

I second Devon's. Get hold of raw files and meta data of the samples

ADD REPLYlink written 2.8 years ago by cpad011212k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1271 users visited in the last hour