antismash 5.0 (New region concept) - counting BGCs
Entering edit mode
2.5 years ago
arshad1292 ▴ 80


I am new to antismash analysis and using the updated/latest version 5.0 therefore can not find answer of my questions in the older threads.

Here is the detail:

I ran antismash and obtained .gbk files as well as a new folder called "region1". (I belie this is the new thing in the latet version). This folder contains several .html files that look like this "ctg3_14_mibig_hits.html" and so on....

When I open this .html file, it contains the following eight columns:

  1. MIBiG Protein
  2. Description
  3. MIBiG Cluster
  4. MiBiG Product
  5. % ID
  6. % Coverage
  7. BLAST Score
  8. E-value

The fourth column (MiBiG Product) contains name of the product e.g. NRP, polyketde, tarpene, other etc. and I am interested in counting the number of BGCs types in each sample. (may be from this column?)

Q1. I am confused which file should I use to count the BGC types? This .html file (I have several) under the "region1" folder or .gbk file?

Q2. In either case, I need a method/script to do so. I will really appreciate if someone can please share the code/script for counting the BGCs in each sample since I have several such files and then tens/hundreds of MIBiG product in each file.

Please help this newbie.

Many thanks,

antismash metagenomics • 1.4k views
Entering edit mode

It looks be the that 4th column. See the answer in this SO thread.

You may also be able to simply cut/sort/uniq/count that column.

Anti-smash HTML output is thoroughly described in their help page.

Entering edit mode

thank you for your response. I have read antismash output but I am still confused about the output files. So I am still struggling to understand the output. Sorry for my lack of knowledge.

For example, I obtained 116 html files from a single run. Well, I have 116 html files then each html file contains tens of MiBiG Product (4th column) please see image. On average if I have 10 MiBiG product for each html file, its going to 1160 files in total for each run. Should I count "MiBiG Product" (4th column) from all these 1160 files and then add them up to obtain total (NRP, polyketide etc.)?

Entering edit mode
2.0 years ago

Probably too late but:

MIBiG is a database of known biosynthetic clusters and is used to output the 'known cluster blast' tab data in the antiSMASH web portal. It's not the predicted clusters for your input genome - it's just similar hits that have been published/confirmed to some degree (compared to the antiSMASH DB which is just predictions for all of ncbi data without being confirmed). I didn't dig into the html files for my stuff that much, but I think the following is correct. You have a single region, with one of more BGCs inside of it. This region has genes encoding proteins. Each of the html files corresponds to a single gene/protein in the region, and the entries in a single html file are MIBiG hits to that single protein. "ctg3_14_mibig_hits.html" would be the html file of MIBiG hits for the region 1 protein annotated '14' in whatever genome you fed into smash. For me, my html files are inside a folder called 'knownclusterblast' - I have around 30 regions so maybe yours aren't in this folder if you only have one region/are using a different SMASH version etc.

For one region you can just click the html 'index' file and that will open a web page with a clear summary. If you have lots of regions I would set something up to parse the JSON file that is also in the output as this has all the information in the antiSMASH run.


Login before adding your answer.

Traffic: 1130 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6