Question

What is the minimal recommended cutoff for fold change for such variable samples?

4

Entering edit mode

4.4 years ago

harelarik ▴ 90

Hi,

We have proteomics results from ~35 mammals. The samples were taken by surgery of the same organ in all individuals.

Since we are working with individual mammals (as opposed to gnetically identical tissue cultures, or plants) there is high variability between samples. Therefore we cannot use for example FDR to filter TTEST results that identify differentially translated proteins between treated and non-treated individuals. IF we do, there are hardly any significant DE proteins. As a results we use only p<0.05 from TTEST to identify DE proteins (no FDR), and number of DE proteins is affected mainly by fold change cutoffs.

My questions are: 1. What is the minimal recommended cutoff for fold change for such variable samples? 2. How to choose lower cutoff that still makes sense? And is still acceptable for publication in a Fine journal.

I have seen for example that Yuan et al., (2016, in: Journal of Proteomics, https://doi.org/10.1016/j.jprot.2020.103683) have used cutoff of 1.3 fold change (and pvalue<0.05) for samples from human (i.e., they have worked with mammal samples like us).

3. Does anyone know on other works with such fold cutoff or lower that were published in reasonable Journals?

Thank you, Arik

Proteomics Cutoff • 4.4k views

ADD COMMENT • link updated 4.4 years ago by i.sudbery 20k • written 4.4 years ago by harelarik ▴ 90

score 5 · Answer 1 · 2020-03-05

A few things first:

If you use a pvalue cut off on TTEST results across the whole proteome most of your results will be false positives. For example, if you detected 8,000 proteins and did therefore did 8,000 tests, you will have around 400 false positives (8000*0.05). If you find 500 proteins p< 0.05, then 80% of them will be false positives. I suggest using FDR with a higher threshold than 0.05.

Instead of using a p-value < 0.05 from a t-test there are several other things you could try: * The T-test is not generally suitable for proteomics data, which is count based (if its MS-MS anyway). You want to try one of the negative binomial based packages, like edgeR, DESeq or limma-voom that are usually used for RNAseq. This will probably give you better FDRs * Failing that you could try an FDR threshold of 0.2 or 0.3 - yes 20%-30% of your hits will be false positives, but that is less than would be the case using a pvalue.

Bearing that in mind, there is no "correct" way to choose a log Fold Change threshold. Log Fold Change thresholds are used to find genes/proteins where the change is big enough to be biologically interesting - they are determining the biological merit, not statistical merit and are thus subjective.

However, it is true that larger log fold changes, particularly at higher expression levels, are more likely to be real than smaller ones, but you can't put a threshold and say "above this threshold is good, below this threshold is bad".

Proper use of the fold change threshold: The correct way to use a fold change threshold, in the absence of meaningful FDR thresholds, would be to not rely on hard thresholds in your downstream analyses. There are many biologically interesting questions you can ask using analyses that rely on ranking proteins, rather than dividing them into two categories (different and not different).

Finally, to put it bluntly, any reviewer than will accept log fold change threshold over an FDR one won't have anything useful to say about the position of that threshold anyway.