Question: How to perform metagenomic binning using ESOM?
gravatar for lakhujanivijay
23 months ago by
lakhujanivijay5.1k wrote:

Hi all,

I have a couple of metagenomic samples in the form of genomic assemblies and I wanted to perform binning and generate ESOM (Emergent self organizing maps) similar to the one mentioned in Figure#1 of this paper. The primary objective is to generate ESOMs. I have tried to go through some papers and I found that binning could be performed based on genomic signatures (i.e. di, tri or tetra nucleotide frequency) generated from metagenomic contigs. There are infact few scripts available to generate those signatures ; for e.g. this one here which could be used as input to databionics ESOM tools.

I have a couple of questions:

  1. Apart from the metagenomic contigs, is there anything else that is required as input or to generate necessary files to be used as input to ESOM tools?

  2. The installation page for ESOM tools mentions to download some files required for MATLAB. is MATLAB absolutely necessary to generate ESOMs? MATLAB is a paid software as per my information.

  3. Finally, can someone point me to a good tutorial/ material for a step-by-step guide to generate ESOMs?

Let me know if someone requires additional details to answer my questions.

Thanks Vijay

esom binning metagenomics • 1.5k views
ADD COMMENTlink modified 23 months ago by 5heikki8.9k • written 23 months ago by lakhujanivijay5.1k
gravatar for 5heikki
23 months ago by
5heikki8.9k wrote:
  1. No
  2. No, it says "if you plan to use it"
  3. That paper you posted yourself and the related git

A few years ago I did some ESOM stuff based on that very paper. In the end I realized that other programs such as MaxBin were far easier/more convenient to use and gave better results (at least for my data)

ADD COMMENTlink modified 23 months ago • written 23 months ago by 5heikki8.9k

Dear 5heikki

Thank you for the response. As I mentioned, the ultimate goal is to generate ESOMs ( that's the requirement of the project ). Maxbin, though will be easier as you mentioned, I think , will not provide emergent maps ; correct ?

Additionally, I am looking at "Calculation of tetranucleotide frequencies and clustering by ESOM" section of the paper. Will this script help me generate tetranucleotide frequencies ? Have you tried it ?

ADD REPLYlink modified 23 months ago • written 23 months ago by lakhujanivijay5.1k

The U-matrix or whatever presentations of the ESOMs themselves don't really serve any purpose apart from looking cool. You're correct about MaxBin not producing ESOMs. However, the ultimate goal, i.e. binning, is the same. I don't remember exactly what I did as it was many years ago. That script at least says that it will generate those frequencies. Do you have any reason to doubt that it won't?

ADD REPLYlink modified 23 months ago • written 23 months ago by 5heikki8.9k

Hi 5heikki

Thanks for the input once again. So, I was able to follow the instructions on the gihub page and thankfully all the programs worked as expected. For the sake of quick start, I just tried this with 2 files

file name | num_seqs | sum_len   | min_len | avg_len | max_len
file1.fa  | 1        | 4641652   | 4641652 | 4641652 | 4641652
file2.fa  | 129350   | 221598580 | 400     | 1713.2  | 1056282

file 1 is the e.coli genome and file 2 is a metagenomic assembly. Now, after the training finished (which took 19 hours to complete - posted that on their github page as an issue) , I have this image:


However, I want to produce an image something like this,



i.e. making it much more informative by adding taxonomic information. What should I do ? Any further inputs ?

ADD REPLYlink modified 23 months ago • written 23 months ago by lakhujanivijay5.1k

AFAIR you can manually select and color regions with the Databionics GUI. The rest of that was probably made with PS or some other image editing software..

ADD REPLYlink written 22 months ago by 5heikki8.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 891 users visited in the last hour