LOC genes in gene expression analysis
2
0
Entering edit mode
9.2 years ago
rs • 0

Hi,

I doing differential gene expression analysis with "limma" using Illumina HT12 data, comparing tumors with and without certain mutation. In my top 25 genes, all significant after multiplicity adjustment, there are at least 20 "LOC" genes, i.e. gene names starting with LOC followed by some number. These are as I understand genes without a name, of unknown function. What would that mean if many of such genes are in my top list? does it mean the analysis is just full of false positives or there could be some biological meaning to this?

Thank you!

rs

gene • 7.4k views
ADD COMMENT
1
Entering edit mode
9.2 years ago
andrew ▴ 560

It depends on the method you are using. If you are using an enrichment methodology, they will affect the statistics for the p-values that are calculated. For example: if you have 10% DEGs, then on any "pathway" you would expect to see 10% DEGs by random chance alone. If you remove the "LOC genes" from your DEGs list thus bringing the DEG % to 5%, you have effectively increased the weight of the DEGs which may cause certain pathways to be significant, when in reality they are not.

Just because the LOC genes are not well annotated, does not diminish their importance in analysis.

This issue also illustrates one of the key limitations of enrichment approaches. The underlying assumption for hypergeometric distribution approaches (of which enrichment is part of) is that it assumes all measurements to be independent. But genes are not independent, and it is precisely why the pathways exist. Pathways describe the interdependencies of the genes in the system. The method iPathwayGuide uses is called Impact Analysis. It uses enrichment plus it considers the topology of the pathway and calculates downstream accumulated perturbations and scores this on an orthogonal axis. This eliminates the false positives that are prevalent with enrichment approaches.

Also, if you are using DAVID, be careful. Their pathways have not been updated since 2009.

Here's some information that describes the approach we use: Impact Analysis

ADD COMMENT
0
Entering edit mode

Thanks, but would these LOC genes even contribute to pathway analysis? I think they wouldn't if I use David.

Still, I wonder if there could be some biological meaning to having many LOC genes deferentially expressed.

Any insights?

Thank you!

ADD REPLY
0
Entering edit mode
9.2 years ago
andrew ▴ 560

I think looking at just the top genes may be very limiting specifically for the reason you cite. Not to mention that 25 genes may or may not be enough. We usually recommend that you want to target between 5% and 10% of the total number of genes measured to be DEG. That means if you measured 10,000 genes, you want about 500 to 1,000 to be DEG by using both FC and p-value cutoffs.

Since you have Limma output for the entire set of genes, you can quickly import that file into iPathwayGuide and get a full analysis on enriched GO terms, Pathways, predicted miRNAs, diseases, and more.

It's 100% free to try and might give you some extra insight.

ADD COMMENT

Login before adding your answer.

Traffic: 3788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6