This is my first time posting, so I hope this question isn't too open ended; sorry if it's a bit long. Anyways, I'm a current bioinformatics masters student, and I've just joined up with a cancer research lab as an intern. They have RNA-seq data that they received from outsourcing their sequencing, and from it, they'd like me to get them a list of the most significant differentially expressed genes by fold changes and p-values.
The problem is that they don't have the raw data. The facility they outsourced their sequencing too did some of the data analysis for them, so what I have to work with is a data frame for each trial, the control and three separate tests. Each data frame contains the gene ID, the transcript ID(s), the length, the expected count, and the FPKM.
This type of analysis is new to me, and in reading how to complete this task using tools such as edgeR, it seems as though it's important to have the raw read counts, which unfortunately I don't have, and don't think I can get. I don't believe that the expected_count is the same thing is it? They do supply an equation for the FPKM as FPKM = (10^6 * C) / )N * L / 10^3); where C is the number of fragments uniquely aligned to the gene, N is the total number of fragments that are uniquely aligned to all genes, and L is the number of bases on the gene. Would C in this equation be equal to the raw read count? It appears to be approximately equal to the expected read count.
Any ideas on how to solve this problem are much appreciated!