Hello all! I am tring to clean obtained mags with the magpurify. For some mags it worked ok, but for some I keep getting error for phylo-markers module:
magpurify phylo-markers /path/bin.5.fa /path/magpurify_results --threads 16
Calling genes with Prodigal
all genes: /path/magpurify_results/phylo-markers/genes.[ffn|faa]
Identifying PhyEco phylogenetic marker genes with HMMER
mm results: /path/magpurify_results/phylo-markers/phyeco.hmmsearch
marker genes: /path/magpurify_results/phylo-markers/markers
Performing pairwise BLAST alignment of marker genes against database
blast results: /path/magpurify_results/phylo-markers/alns
Finding taxonomic outliers
Traceback (most recent call last):
File "/path/miniconda3/bin/magpurify", line 10, in <module>
sys.exit(cli())
^^^^^
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/cli.py", line 116, in cli
args["func"](args)
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/phylo.py", line 419, in main
flagged = flag_contigs(args["db"], args["tmp_dir"], args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/phylo.py", line 372, in flag_contigs
bin.genes[aln["qname"]].annotations.append(annotation)
~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'k127_4584534_11'
So, this - KeyError: 'k127_4584534_11' is always different for different bin, is just name of contig that tool don't recognize?
And when I try use gc-content, it also will give me error for the same bin, that gave error with the phylo-markers
magpurify gc-content /path/bin.5.fa /path/magpurify_results
Computing mean contig GC content
Traceback (most recent call last):
File "/path/miniconda3/bin/magpurify", line 10, in <module>
sys.exit(cli())
^^^^^
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/cli.py", line 116, in cli
args["func"](args)
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/gc.py", line 68, in main
contig.gc = round(SeqUtils.GC(seq), 2)
^^^^^^^^^^^
AttributeError: module 'Bio.SeqUtils' has no attribute 'GC'
If I run modules tetra-freq, clade-markers and known-contam, it will work fine
magpurify clade-markers /path/bin.5.fa /path/magpurify_results --threads 16
Reading database info
Calling genes with Prodigal
all genes: /path/magpurify_results/clade-markers/genes.[ffn|faa]
Performing pairwise alignment of genes against MetaPhlan2 database of clade-specific genes
alignments: /path/magpurify_results/clade-markers/genes.m8
Finding top hits to database
2118 genes with a database hit
Classifying genes at each taxonomic rank
kingdom: 104 classified genes
phylum: 0 classified genes
class: 0 classified genes
order: 0 classified genes
family: 0 classified genes
genus: 0 classified genes
species: 0 classified genes
Taxonomically classifying contigs
total contigs: 995
kingdom: 94 classified contigs
phylum: 0 classified contigs
class: 0 classified contigs
order: 0 classified contigs
family: 0 classified contigs
genus: 0 classified contigs
species: 0 classified contigs
Taxonomically classifying genome
consensus taxon: None
Identifying taxonomically discordant contigs
0 flagged contigs: /path/magpurify_results/clade-markers/flagged_contigs
Has anyone encountered something like this? Are there any solutions?
I would appreciate any suggestions.
Thanks, Alla
Thanks
What about phylo-markers module error, I tried to look for it :) But haven't found anything. And why it's not working for some but not all bins...
To me that sounds like an error in sequence header formatting. Maybe some contig headers are repeated? Or their sequences are short or missing? Are there N or X characters in such contigs?
Another general advice: if a tool works for some contigs but not the others, see what is different about those contigs.
Thank you
Strange but there are no such headers or contig names in my fasta files