Question

Filtering by FPKM, opinions and thoughts

3

Entering edit mode

8.1 years ago

Biogeek ▴ 470

Hi,

In terms of FPKM filtering why do people carry out this process? May be an obvious question, but was does it ultimately achieve?

-Is it to remove lowly expressed possible contaminating reads from other organisms which may live within the same environment? -Is it to remove 'background noise'

If someone where to carry out FPKM filtering, how does one decide a threshold. Should it be density plot of FPKM of each sample used in assembly?

Lastly, I have seen values of 0, 0.3, 1 and 1.5 FPKM being used as a threshold. Is this arbitary or do people select based on a certain parameter or decision in the data?

Would be keen to see what people think, and also the information people can provide on the matter.

Thanks.

fpkm filtering • 4.1k views

ADD COMMENT • link updated 8.1 years ago by Carlo Yague 8.7k • written 8.1 years ago by Biogeek ▴ 470

score 1 · Answer 1 · 2016-03-30

1

Entering edit mode

8.1 years ago

Devon Ryan 104k

There are a plethora of different reasons that people do this, but most commonly they're trying to get just "expressed" genes. The actual thesholds are essentially arbitrary and will vary with every experiment (so don't blindly use a reported threshold). To derive one of these, either compute zFPKMs or plot the FPKM distribution and visually choose a reasonable value. Alternatively, don't use FPKMs and don't bother with this whole process unless you trully need to.

ADD COMMENT • link 8.1 years ago by Devon Ryan 104k

0

Entering edit mode

Hi ryan, Thanks for the suggestion. That was interesting as I too have the same question. Good to know about the zFPKM method. So I used the script available online "https://github.com/severinEvo/gene_expression/blob/master/zFPKM.R" After computing zpkm for every transcript, it once again gave me the output with values ranging from-3 to 8. Now from these zfpkm values how to find the threshold. Kindly guide me, if I misunderstood anything. Thanks in advance.

ADD REPLY • link 8.1 years ago by EVR ▴ 610

1

Entering edit mode

The zFPKM paper recommended a threshold of -3 (see table 1 in the paper). Perhaps the script does filtering for you, I've never used it.

ADD REPLY • link 8.1 years ago by Devon Ryan 104k

score 0 · Answer 2 · 2016-03-30

For me, there are two main reasons for why I want to dismiss lowly or un-expressed genes :

First it reduces the memory requirement for subsequent analysis and increase its speed.
Secondly, if one carry out differential expression analysis, there are good chances to lack power to find significant difference for lowly expressed genes. So if one remove them before testing, the multiple testing correction (FDR) will be less stringent on "truly expressed" genes and the detection power increases.

Is it to remove lowly expressed possible contaminating reads from other organisms which may live within the same environment?

I don't really think that this is the first motivation but why not.

If someone were to carry out FPKM filtering, how does one decide a threshold ?

Not sure about FPKM. However DESeq2 does a similar filtering on low counts genes (counts are another metric of gene expression). They explain their method in detail in the sections 3.8 and 4.7 of their manual. Look at figure 12 to see how they decide on the treshold.