Hello,
I am new in this kind of analysis and I have a .csv file containing RNA-Seq data from different cell lines (with at least 3 replicates) normalised to TPM already, unfortunately I cannot access to the raw counts.
Symbol ID C1 C2 C3 D1 D2 D3 D4
1 TSPAN6 ENSG00000000003.13 133.95 132.07 64.47 54.85 53.65 47.87 56.37
2 TNMD ENSG00000000005.5 10.39 3.47 1.11 0.58 1.74 0.36 1.68
3 DPM1 ENSG00000000419.11 67.67 124.98 33.02 8.35 12.95 12.31 13.33
4 SCYL3 ENSG00000000457.12 2.59 1.40 2.61 5.03 4.70 2.98 3.71
5 C1orf112 ENSG00000000460.15 12.32 46.18 16.49 19.54 19.20 11.72 8.55
6 FGR ENSG00000000938.11 0.00 0.00 0.04 0.36 0.08 0.00 0.00
So my question is: Is there a way I can follow to obtain the logFC, p-values, t-values and padj starting from this .csv file? I read about DESeq, DESeq2, EdgeR, limma and it looks like if all the R packages would ask for the raw counts. I would like to perform a Differential Expression Analysis.
Any suggestions about how to start? Any help is very appreciated.
you should not perform DEA on TPM, you need raw counts, you can a take look at tximport. It can generate the raw read counts. You can in a way get counts from the TPM file. Look at the two links. Link1 and Link2 . Or if you want to do it on your own you need to have effective length and length of the genes.
Just to extend or vchris_ngs comment, here are two quotes on why it is a bad idea to perform differential expression on TPM. The first is from EBSeq github page:
The second is from Conesa et al. (2016):