Question: How To Extract The Core Genes From The Orthomcl Output File?
5
gravatar for Lisa
5.3 years ago by
Lisa310
Ireland
Lisa310 wrote:

Hi. I was wondering if anybody can help me figure out how to use Orthomcl to identify the core genome of E. coli genomes? I have 52 E. coli genomes that I used in orthomcl to produce ortholog groups. I followed all the steps in the user guide, until I got to the end. Now I'm left with this massive file of ortholog groups, but I'm unsure how to proceed.

This is a snippet from the middle of my output file, as the head command just gives too much information as it's my biggest ortholog group. The part before the colon is the ortholog group, the parts after that are genomes and genes which are clustered together into groups.

ecoli6370: col125|YP_006311412.1 col139|YP_007556103.1 col23|YP_001729413.1 col3|NP_286258.1 col4|NP_308598.1 col53|YP_002998320.1 col55|YP_003043686.1 col56|YP_003053130.1 col7|YP_488800.1 col73|YP_003498239.1 col92|YP_006127895.1
ecoli6371: col125|YP_006312035.1 col127|YP_006770839.1 col131|YP_006779890.1 col134|YP_006785029.1 col3|NP_286985.1 col31|YP_002271784.1 col4|NP_309246.1 col45|YP_002397150.1 col57|YP_003079099.1 col59|YP_003222735.1 col64|YP_003233659.1
ecoli6372: col125|YP_006312040.1 col127|YP_006770834.1 col131|YP_006779885.1 col134|YP_006785024.1 col3|NP_286990.1 col31|YP_002271776.1 col4|NP_309251.1 col45|YP_002397155.1 col57|YP_003079092.1 col59|YP_003222730.1 col64|YP_003233664.1

I tried converting this file to a binary matrix, following the instructions from here (http://smokeandumami.com/2010/01/21/gene-accumulation-curves-in-r/), but I'm still stuck with how to proceed.

Thanks, I appreciate any help you can give me. Please let me know if I should provide any more information.

Lisa

Sorry for the delay, here's an example of what my binary matrix looks like. I just took a few lines as it's so large.

"ecoli1000" "ecoli1001" "ecoli1002" "ecoli1003" "ecoli1004" "ecoli1005" 
"col0"   1   1   0   0   1   0
"col1"   0   1   0   0   0   1
"col2"   0   0   1   0   1   1
"col3"   0   1   0   0   0   0
"col4"   1   0   0   1   1   1
"col5"   1   0   0   1   0   0
orthomcl • 5.0k views
ADD COMMENTlink modified 4.7 years ago by amanjain0 • written 5.3 years ago by Lisa310

Could you show us the binary matrix? I believe it'll be easier to explain it from that.

ADD REPLYlink written 5.2 years ago by sentausa630
3
gravatar for sentausa
5.2 years ago by
sentausa630
France
sentausa630 wrote:

Anyway, I'll try to explain it without the binary matrix.

Since you are interested to find the core genes, basically all you have to do is to find ortholog groups from the OrthoMCL results that contain all 52 strains. If a strain does not have a gene/protein in an ortholog group, it means that this gene/protein is absent in the strain. Therefore, this gene/protein is not part of the core genome, since the definition of a species' core genome is all genes that belong to all strains of the species.

So, in the binary matrix shown on the blog, you'd be interested only to the columns that have no 0 in them.

ADD COMMENTlink written 5.2 years ago by sentausa630

Thanks that makes a bit more sense. It seems really simple when you say it like that, so I think I was just having temporary brain melt or something.

ADD REPLYlink written 5.2 years ago by Lisa310
0
gravatar for amanjain
4.7 years ago by
amanjain0
United States
amanjain0 wrote:

I have a very very simple way to find core gene clusters through excel. Tell me if anyone needs help........

If anyone needs help on venn diagrams try http://bioinformatics.psb.ugent.be/webtools/Venn/   it will do your work in seconds. 

ADD COMMENTlink written 4.7 years ago by amanjain0

Hi, i need help with this very very simple way to find core gene clusters through excel. Could you explain me how?

ADD REPLYlink written 4.5 years ago by marcelokuchar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour