How to produce a unimodal distribution of genes expressed genes
1
0
Entering edit mode
3.8 years ago
mail2steff ▴ 60

I have executed cuffdiff for five different samples and got the FPKM values of genes across five different samples. Each sample contained a group of genes with very low FPKM values,representing low expression or background. I am planning to identify minimal expression value to avoid false positive estimation of gene expression. How can I identify the cut-off to avoid False- positive?

Can I identify by ploting density graph?

S1        S2       S3     S4     S5

1229.46  1.52844    0   10.7805  109.81

1229.96  814.614  2109.44  1138.93  673.454

1247.19  225.283  78.9963  76.2897  607.874

1250.08  3.94648  0.349388  11.9385  65.4146

1257.49  9.58456  8.32604  21273.9  8724.36


The above is the sample FPKM values for each stage. Can anyone please help me with this. Ive attached a plot which I copied from one research journal. How can get the similar plot for my data along with cutoff value

https://ibb.co/e5A8XG

cuffdiff RNA-Seq R density plot ggplot2 • 1.1k views
1
Entering edit mode
3.8 years ago
vinvan ▴ 50

As far as I know there is no golden standard when it comes to determining a cutoff value. Moreover, depending on the goal of your experiment you might not want to discard the genes which are lowly expressed. If you would like to apply a cutoff, I would suggest making the density plots and see whether this would guide you towards a background expression level (which would show up as a peak or shoulder on the left side of the plot). If this is not clearly visible, you could think about removing genes with FPKM smaller than a certain - arbitrary - cutoff (FPKM > 1 is often used).

If you are familiar with R, a version of the graph you showed can be made with

plot(density(log2(data)[,1])
lines(density(log2(data)[,x])


where x indicates the number of the column you want to overlay (this line can be repeated to plot all the columns)

0
Entering edit mode

Thank you for ur hint. I have two doubts: (1)I tried with R and got separate plots for five different samples. I am a new baby to R. How can I combine those plots into the single one? (2) Please bear with me. In the plot, I got one value called bandwidth. Can I consider that as cutoff value?

0
Entering edit mode

I modified my answer slightly for clarity.

1) run the plot and lines commands in succession; lines can be used to overlay a plot

2) I would suggest to read the manual of the density command with ?density in R