Question: Subsampling procedure for differential analysis
0
gravatar for Vasu
15 months ago by
Vasu410
Vasu410 wrote:

I have 30 tumor and 3 normal samples. MDS plot looks like this Tumor vs Normal I have used edgeR and selected differential expressed genes based on Fold change greater than 1.2 and FDR < 0.05. Differential analysis between tumor and normal gave only two upregulated genes.

So, I thinking to apply random selection of samples. Selecting random samples from tumor condition and do differential analysis with that and repeat the process n times. This gives different set of genes differentially expressed in different analysis.

But not sure how to select final differentially expressed genes because same gene can be differentially expressed in different analysis with different fold change and fdr values.

1) Do you think applying subsampling for this a right choice? If not when subsampling can be applied?

2) As I get only 2 upregulated genes with FC > 1.2 and FDR < 0.05, can I increase the FDR to 0.5 or 0.1 to get more upregulated genes? Is selecting genes based on FDR < 0.5 or 0.1 a right choice?

ADD COMMENTlink written 15 months ago by Vasu410

Looks like your tumor samples are widely different from each other. So the result of just two genes is correct. Those are the two that are consistently different between the normal/tumor condition. Any other genes may not be distinct in some group of your tumor samples.

ADD REPLYlink written 15 months ago by karl.stamm3.5k

Yes ofcourse I know that only two genes are consistently different between the normal/tumor condition. But what I asked is can I increase FDR cutoff from 0.05 to 0.1 to get more differentially expressed genes? Or should I apply subsampling method?

ADD REPLYlink written 15 months ago by Vasu410

Yes of course increasing FDR cutoff from 0.05 to 0.50 will give you lots more differentially expressed genes. I think the random sub-sampling is going to give uninterpretable results. Maybe try hold-one-out, where you run 29 vs 3 with each sample held out, creating 30 different result sets, and see how many genes are commonly found. Probably just your same two will show up consistently, but with this data you could talk about N genes found in 90% of "29-selections".

ADD REPLYlink written 15 months ago by karl.stamm3.5k

If I try this hold-one-out procedure I may get genes common genes and those common genes may have different fdr and foldchange values in different result sets right. From that how can I select the right one with values. Like for example see the following:

In one analysis

            baseMean        log2FoldChange    lfcSE       stat    pvalue    padj
AL357060.1    8.50582           6.1871       1.67335    3.54023    0.0003    0.03245

In another analysis same gene found differentialy expressed having different values

              baseMean     log2FoldChange       lfcSE        stat    pvalue    padj
AL357060.1    10.58937424    6.552371044      1.6296950    3.85921    0.00011    0.02642

So from this two analysis I can take that gene as commonly found but which value should I consider?

ADD REPLYlink written 15 months ago by Vasu410

Theyre both true in different ways. "The LFC is >6"

ADD REPLYlink written 15 months ago by karl.stamm3.5k

So, I should select common genes from different result sets based on a cutoff? And may I know when I can use subsampling procedure?

ADD REPLYlink written 15 months ago by Vasu410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2479 users visited in the last hour