Hello! I wanted to reach out to the community of experts to get some advice. 2 years ago this Resource paper on lung cancer from the CPTAC consortium came out: https://www.sciencedirect.com/science/article/pii/S0092867420307443#mmc1 , which is really a goldmine of valuable data for the field. We frequently use this kind of resource to generate and validate hypotheses. However, with this particular set, the RNA-seq data reported is in RPKM or z-score instead of raw counts, precluding its use for differential expression analysis (using Deseq2 for example). So my questions to the community are:
1)What would be the appropriate way of comparing RNA-Seq expression data between groups of samples using the RPKM values (if any)? Could I just run a Wilcoxon or t-test on them? I think that this not appropiate, but I just want to know if I can work with the data as is. 2)Does anybody know if there is a way of either asking for the raw counts or generating it from the currently published data? 3)I am sure that there is a good reason for publishing RPKM instead of raw counts, would somebody be so kind as to briefly explain to me its advantages?. If statistics cannot be run, I have a hard time understanding.
Thanks a lot!! I would really appreciate your input!!