Question: How to do a DGE analysis of a list of 1000 genes of interest?
0
gravatar for nattzy94
7 months ago by
nattzy9420
nattzy9420 wrote:

I am doing a DGE analysis of a total RNAseq dataset of 2 timepoints (5 reps each). I am particularly interested in looking for changes in expression of 1000 genes.

Currently, I have done the analysis by analysing all genes and then picking the 1000 genes I am interested in. However, my PI has suggested that I could try doing the the analysis by doing DGE on just the 1000 genes. Theoretically, this should improve the statistical significance since there would be minimal adjustments for multiple hypothesis testing.

Is this an advisable way of doing the analysis? Since differential expression levels are fit to a negative binomial distribution (in the case of DESeq2), wouldn't this just mean most of the 1000 genes I input would end up not being differentially expressed?

Edit: We arrived at the list of 1000 genes as we were interested particularly in genes coding small proteins. Hence, we searched Uniprot for human proteins with a maximum length of 100 amino acids.

rna-seq R • 187 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by nattzy9420
1
gravatar for Papyrus
7 months ago by
Papyrus620
Papyrus620 wrote:

In my opinion this is not an advisable way of doing the analysis. The main problem is how one arrives at the list of interest. In your case, it seems that these 1000 genes were selected a posteriori by their statistical significance and not "biological" reasons. So for me it is hardly justifiable.

ADD COMMENTlink written 7 months ago by Papyrus620

Thanks for the reply Papyrus.

The list of 1000 genes was compiled by searching for small proteins. We searched the Uniprot database for human proteins of max. length 100 amino acids. Since we are only interested in small proteins in the analysis, would this be a sufficient reason?

ADD REPLYlink modified 7 months ago • written 7 months ago by nattzy9420

Since we are only interested in small proteins in the analysis, would this be a sufficient reason?

No. You could have done a different type of experiment if you really wanted to just focus on those 1000 genes. However, you chose RNA-seq and you therefore should stick to conventions in RNA-seq.

ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe69k

I would preferably do pathway enrichment analysis on the whole DEG results to see if among your list of differentially expressed genes there is an enrichment in small proteins. In general, you may perform pathway-focused analyses (such as GSEA) to see how specific pathways behave in your data.

ADD REPLYlink modified 7 months ago • written 7 months ago by Papyrus620
1
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Is this an advisable way of doing the analysis?

In my opinion, it is not advisable. I would use the entire dataset and then check the p-values of your genes of interest, while being open to other genes that may be statistically significant, too.

Prior to normalisation, you can, of course, rigorously filter your dataset for low-count genes.

Kevin

ADD COMMENTlink modified 7 months ago • written 7 months ago by Kevin Blighe69k
0
gravatar for swbarnes2
7 months ago by
swbarnes29.4k
United States
swbarnes29.4k wrote:

Don't filter up front, if only so that you can use data from all the genes for library normalization and dispersion estimates.

You can filter your results list afterwards, if you really want.

ADD COMMENTlink written 7 months ago by swbarnes29.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1088 users visited in the last hour
_