I want to use the SCANB-Datasets (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81540) for classifying samples into PAM50 breast cancer subtypes. However, the dataset is only available as FPKM-log2-transformed data (only the very original expression data is also available, but I would like to avoid all these additional preprocessing steps they already carried out).
FPKM is not suitable for cross-sample comparisons. What normalization can I put on top of the data to achieve cross-sample comparability? I guess I need to re-log2-transform the data first to then apply another normalization strategy?
Is this a valid strategy at all? Should I try to back-transform the FPKM values to the original counts to start a fresh normalization?
I appreciate your help with this!