Subsampling procedure for differential analysis
0
0
Entering edit mode
4.4 years ago
Vasu ▴ 720

I have 30 tumor and 3 normal samples. MDS plot looks like this Tumor vs Normal I have used edgeR and selected differential expressed genes based on Fold change greater than 1.2 and FDR < 0.05. Differential analysis between tumor and normal gave only two upregulated genes.

So, I thinking to apply random selection of samples. Selecting random samples from tumor condition and do differential analysis with that and repeat the process n times. This gives different set of genes differentially expressed in different analysis.

But not sure how to select final differentially expressed genes because same gene can be differentially expressed in different analysis with different fold change and fdr values.

1) Do you think applying subsampling for this a right choice? If not when subsampling can be applied?

2) As I get only 2 upregulated genes with FC > 1.2 and FDR < 0.05, can I increase the FDR to 0.5 or 0.1 to get more upregulated genes? Is selecting genes based on FDR < 0.5 or 0.1 a right choice?

RNA-Seq R differential analysis rna • 1.7k views
0
Entering edit mode

Looks like your tumor samples are widely different from each other. So the result of just two genes is correct. Those are the two that are consistently different between the normal/tumor condition. Any other genes may not be distinct in some group of your tumor samples.

0
Entering edit mode

Yes ofcourse I know that only two genes are consistently different between the normal/tumor condition. But what I asked is can I increase FDR cutoff from 0.05 to 0.1 to get more differentially expressed genes? Or should I apply subsampling method?

0
Entering edit mode

Yes of course increasing FDR cutoff from 0.05 to 0.50 will give you lots more differentially expressed genes. I think the random sub-sampling is going to give uninterpretable results. Maybe try hold-one-out, where you run 29 vs 3 with each sample held out, creating 30 different result sets, and see how many genes are commonly found. Probably just your same two will show up consistently, but with this data you could talk about N genes found in 90% of "29-selections".

0
Entering edit mode

If I try this hold-one-out procedure I may get genes common genes and those common genes may have different fdr and foldchange values in different result sets right. From that how can I select the right one with values. Like for example see the following:

In one analysis

            baseMean        log2FoldChange    lfcSE       stat    pvalue    padj
AL357060.1    8.50582           6.1871       1.67335    3.54023    0.0003    0.03245


In another analysis same gene found differentialy expressed having different values

              baseMean     log2FoldChange       lfcSE        stat    pvalue    padj
AL357060.1    10.58937424    6.552371044      1.6296950    3.85921    0.00011    0.02642


So from this two analysis I can take that gene as commonly found but which value should I consider?

0
Entering edit mode

Theyre both true in different ways. "The LFC is >6"

0
Entering edit mode

So, I should select common genes from different result sets based on a cutoff? And may I know when I can use subsampling procedure?