Hi! here is my question. I use the UCSC xena browser to download TCGA data (log2 transformed) and I would like to classify my genes of interest according to their log2 FC (respect to normal tissue) and their respective p-value. To calculate log2 FC I took the mean expression value of gene X in the tumor tissue and subtracted it to the mean expression value of gene X in normal tissue. But to calculate the p value of that FC I'm not sure if it is correct to consider the p-value of the t test performed from the expression of gene X (log2 value) between normal and tumor tissue. If this is not the case, could you guide me on how to calculate this p value?.
Can you specify, exactly, the file that you downloaded? - provide a link, if you can.
If the data that you downloaded is indeed log [base 2] (log2) transformed, then, the way that you calculated the log2 fold changes is correct. The p-values are not calculated from the fold-change - they are calculated independently. The correct test to use to derive the p-values will depend on the exact data distribution. For RNA-seq, I would always favour obtaining raw counts and then inputting these to EdgeR or DESeq2, where I would then normalise these and derive test statistics using the functions provided by these programs. If you literally just have log2 transformed counts, you could just fit a linear model to the data independently for each gene with lm() and derive a p-value from that, or, just use a Welch Two Sample t-test for All Tumour versus All Normal. Using a Wilcoxon Signed Rank test on just the paired tumour-normal pairs is also valid, I feel.