Average Nucleotide Identity for bins
1
0
Entering edit mode
7 weeks ago
shevch2009 ▴ 20

Hello all!

I have a shotgun dataset, from which I was able to get some bins with different completeness and contamination rates. Now, I want to calculate Average Nucleotide Identity to see if our MAGs are new species. I am planning to use FastANI. I need reference genomes that will match the MAGs I have.

Most of our bins are assigned (GTDB) to genus/family level, with just a few assigned to species.

I have a question: for example, I have one bin that was assigned to d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiia;o__Chthoniobacterales;f__UBA10450;g__AV40;s__ ... completeness 100, although it has only 23 contigs.

What I think I need to do is to download all species from the g__AV40 genus, which is available on the GTDB website, but they all have different completeness and it’s not reference data — I mean those bacterial genomes were not from isolates but rather MAGs, and there are no complete genomes available on NCBI if I use the original name of the genus.

So the issue is: I can get those genomes (from GTDB), but they are not really reference genomes.

What should I do in this situation?

Thanks, Best, Alla

ANI bin data shotgun • 512 views
ADD COMMENT
1
Entering edit mode
7 weeks ago
Mensur Dlakic ★ 30k

What I think I need to do is to download all species from the g__AV40 genus, which is available on the GTDB website, but they all have different completeness and it’s not reference data — I mean those bacterial genomes were not from isolates but rather MAGs, and there are no complete genomes available on NCBI if I use the original name of the genus.

You have already answered your own question. Can't use the data that is not available. There are many bacterial genera without a single complete genome. In fact, the overwhelming majority are in that category, because very few isolates are out there with completed genomes compared to the number of MAGs.

I will try to correct a few misconceptions you seem to have. MAG completeness is not estimated from the genome size or the number of contigs. It is estimated from the number of single-copy marker genes in MAGs. That's why a MAG at 1.8 Mb and 30 contigs and a MAG at 3 Mb and 25 contigs can both be 100% complete and with 0% contamination. I picked a random group of several MAGs from my computer to illustrate this.

Name        Completeness  Contamination  MAG_size  Contigs  Largest
group_003   99.66         0.00           2399243   48       235868      
group_009   100.00        0.00           3316968   309      145584
group_020   99.43         3.61           2829889   205      131228
group_027   96.58         4.01           4592524   224      166430
group_029   91.67         0.65           3909026   304      94718
group_059   95.73         2.37           1651387   207      36337
group_036   95.76         0.00           1947777   141      138361

Second, I think you might be assigning too much significance to whether you have a new species or not. It is difficult to tell that something is a new species with certainty because, as you noted, we don't have a good resolution. It may feel important because "how many people really discover a new species?" but the reality is that anyone who's analyzed a 50+ MAG metagenome has probably discovered a new species.

I think what should be most important for your purpose is that you have a seemingly 100% complete Verrucomicrobiota MAG that belongs to genus AV40. Any conclusion beyond that might be stretching it, and frankly I don't think it is very important to conclude anything beyond that. But there is no harm if you go find GTDB species representatives for your group and determine their ANI against your MAG.

ADD COMMENT
0
Entering edit mode

Thank you. I was just wondering if there are any methods that can be applied. I do understand that metagenomics has many challenges and unresolved issues.

ADD REPLY

Login before adding your answer.

Traffic: 5884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6