Good night, could you please help me with the following question. Is it better to use logFC or logCPM to analyze my RNA-seq data between different treatments?
which is better to use p-value or FDR?
thanks for your help
Since logFC reflects the difference between your conditions, and that's what you're interested in, that is what you should pay attention to, and what will be meaningful to think about. LogCPM means very little in RNA Seq experiments. If you have RNA-Seq data, you're typically measuring thousands of genes, which means you're testing thousands of hypotheses, and so it is better to use FDR rather than simple p-values. Remember that for a p-value cut off of 0.05, you're essentially saying you are rejecting the null hypothesis (no difference between your conditions) in favor of the alternate hypothesis (there is an effect, a measurable difference between your conditions), and that if you're wrong (the null hypothesis is actually true), you would only see an effect as large as the one you measured 1 in 20 times. You can apply this same logic to all the genes you're measuring. Under the null hypothesis and a p-value cutoff of 0.05 you would expect a false positive 1 in 20 times, and since you're measuring thousands of genes, you can expect to have many genes pass a p-value threshold by chance simply because you are performing so many measurements. Thus you must "adjust" your p-values to account for this (calculate an FDR), which usually means inflating the p-values in some way. Genes with very low p-values will survive the adjustment (a very tiny number can still be very tiny even if multiplied by another number). (if you search here for FDR or adjusted p-value, you'll find better explanations by people who actually know what they're taking about). Also, make sure you read the edgeR userguide, it's a good reference and explains what CPM is and isn't good for).