Question

How can I confidently determine if my genome sequence Is truly Anaplasma marginale?

0

Entering edit mode

12 weeks ago

ashiqullah024 • 0

I am doing my thesis on comparative whole genome analysis of Anaplasma marginale. I am comparing my whole genome sequence of Anaplasma marginale with 24 other strains from NCBI. Using general bioinformatics pipeline, I assembled my genome with SPAdes. Since the assembly was fragmented, I used RagTag to scaffold the contigs. My genome coverage was 23.4x.

When I performed comparative analysis, I calculated Average Nucleotide Identity (ANI) using both JSpecies and FastANI. In both cases, the ANI value compared to Anaplasma marginale strains was around 85–86%, whereas from the literature, the species threshold is usually between 95–96%.

When I created a phylogenetic tree, my strain was located very far from the cluster of the 24 marginale strains—too distant. For phylogenetic tree construction, I used Progressive Mauve for multiple sequence alignment and IQ-TREE for tree generation. In all cases, my sequence predominantly matched Anaplasma ovis.

However, when I used KBase automated software for phylogenetic tree construction with the same FASTA file, but instead of only including the 24 Anaplasma marginale strains, KBase included a broader database of related Anaplasma organisms, my sequence clustered with one of the Anaplasma marginale strain.

Now I am really confused about whether KBase is reliable. Can I trust the KBase result when all my Linux-based tools show significant similarity with Anaplasma ovis rather than Anaplasma marginale? I am confused about how I can be sure whether my sequence is actually Anaplasma marginale or not.

anaplasma genomics marginale omics • 654 views

ADD COMMENT • link updated 11 weeks ago by Ram 45k • written 12 weeks ago by ashiqullah024 • 0

1

Entering edit mode

Since the assembly was fragmented, I used RagTag to scaffold the contigs.

How many contigs did you end up with. You should add more coverage, specially long read coverage (if you only used short reads) to get a better assembly.

My genome coverage was 23.4x.

Did you try to align your data with RefSeq genome. Did you get coverage (at least a few reads) across the entire genome or were there areas that had no reads aligned.

ADD REPLY • link 12 weeks ago by GenoMax 154k

score 2 · Answer 1 · 2025-08-04

Hi,

The low(er) ANI might be a result from the 23x coverage, which seems a bit low to me.

There are a couple of tools which you can also use to verify your species. E.g., Kraken2 and Centrifuge work on read-level to identify the taxonomy. SourMash can be used to classify your assembly.

Additionally, you can extract the 16S sequence of your assembly and run your phylogenetic analysis on that.

score 2 · Answer 2 · 2025-08-04

I would keep it simple and just blast contigs/genes against general blast DBs at the NCBI/EBI. You will soon work out what you have due to the hit distributions and top hits.

85% is certainly very distant though so I suspect you've sequenced another genus at least.

I am also a fan of sourmash for a quick and easy classification too as michael suggests.