Question: How To Extract The Core Genes From The Orthomcl Output File?
gravatar for Lisa
6.8 years ago by
Lisa320 wrote:

Hi. I was wondering if anybody can help me figure out how to use Orthomcl to identify the core genome of E. coli genomes? I have 52 E. coli genomes that I used in orthomcl to produce ortholog groups. I followed all the steps in the user guide, until I got to the end. Now I'm left with this massive file of ortholog groups, but I'm unsure how to proceed.

This is a snippet from the middle of my output file, as the head command just gives too much information as it's my biggest ortholog group. The part before the colon is the ortholog group, the parts after that are genomes and genes which are clustered together into groups.

ecoli6370: col125|YP_006311412.1 col139|YP_007556103.1 col23|YP_001729413.1 col3|NP_286258.1 col4|NP_308598.1 col53|YP_002998320.1 col55|YP_003043686.1 col56|YP_003053130.1 col7|YP_488800.1 col73|YP_003498239.1 col92|YP_006127895.1
ecoli6371: col125|YP_006312035.1 col127|YP_006770839.1 col131|YP_006779890.1 col134|YP_006785029.1 col3|NP_286985.1 col31|YP_002271784.1 col4|NP_309246.1 col45|YP_002397150.1 col57|YP_003079099.1 col59|YP_003222735.1 col64|YP_003233659.1
ecoli6372: col125|YP_006312040.1 col127|YP_006770834.1 col131|YP_006779885.1 col134|YP_006785024.1 col3|NP_286990.1 col31|YP_002271776.1 col4|NP_309251.1 col45|YP_002397155.1 col57|YP_003079092.1 col59|YP_003222730.1 col64|YP_003233664.1

I tried converting this file to a binary matrix, following the instructions from here (, but I'm still stuck with how to proceed.

Thanks, I appreciate any help you can give me. Please let me know if I should provide any more information.


Sorry for the delay, here's an example of what my binary matrix looks like. I just took a few lines as it's so large.

"ecoli1000" "ecoli1001" "ecoli1002" "ecoli1003" "ecoli1004" "ecoli1005" 
"col0"   1   1   0   0   1   0
"col1"   0   1   0   0   0   1
"col2"   0   0   1   0   1   1
"col3"   0   1   0   0   0   0
"col4"   1   0   0   1   1   1
"col5"   1   0   0   1   0   0
orthomcl • 5.7k views
ADD COMMENTlink modified 16 months ago by Dattatray Mongad350 • written 6.8 years ago by Lisa320

Could you show us the binary matrix? I believe it'll be easier to explain it from that.

ADD REPLYlink written 6.7 years ago by sentausa640
gravatar for sentausa
6.7 years ago by
sentausa640 wrote:

Anyway, I'll try to explain it without the binary matrix.

Since you are interested to find the core genes, basically all you have to do is to find ortholog groups from the OrthoMCL results that contain all 52 strains. If a strain does not have a gene/protein in an ortholog group, it means that this gene/protein is absent in the strain. Therefore, this gene/protein is not part of the core genome, since the definition of a species' core genome is all genes that belong to all strains of the species.

So, in the binary matrix shown on the blog, you'd be interested only to the columns that have no 0 in them.

ADD COMMENTlink written 6.7 years ago by sentausa640

Thanks that makes a bit more sense. It seems really simple when you say it like that, so I think I was just having temporary brain melt or something.

ADD REPLYlink written 6.7 years ago by Lisa320
gravatar for Dattatray Mongad
16 months ago by
National Centre for Cell Science, Pune
Dattatray Mongad350 wrote:

Use this code It will generate all core, accessory and uniq genes fasta files.

ADD COMMENTlink written 16 months ago by Dattatray Mongad350
gravatar for amanjain
6.2 years ago by
United States
amanjain0 wrote:

I have a very very simple way to find core gene clusters through excel. Tell me if anyone needs help........

If anyone needs help on venn diagrams try   it will do your work in seconds. 

ADD COMMENTlink written 6.2 years ago by amanjain0

Hi, i need help with this very very simple way to find core gene clusters through excel. Could you explain me how?

ADD REPLYlink written 6.1 years ago by marcelokuchar0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1323 users visited in the last hour