Hello all! I am tring to clean obtained mags with the magpurify. For some mags it worked ok, but for some I keep getting error for phylo-markers module:
magpurify phylo-markers /path/bin.5.fa /path/magpurify_results --threads 16
Calling genes with Prodigal
all genes: /path/magpurify_results/phylo-markers/genes.[ffn|faa]
Identifying PhyEco phylogenetic marker genes with HMMER
mm results: /path/magpurify_results/phylo-markers/phyeco.hmmsearch
marker genes: /path/magpurify_results/phylo-markers/markers
Performing pairwise BLAST alignment of marker genes against database
blast results: /path/magpurify_results/phylo-markers/alns
Finding taxonomic outliers
Traceback (most recent call last):
File "/path/miniconda3/bin/magpurify", line 10, in <module>
sys.exit(cli())
^^^^^
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/cli.py", line 116, in cli
args["func"](args)
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/phylo.py", line 419, in main
flagged = flag_contigs(args["db"], args["tmp_dir"], args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/phylo.py", line 372, in flag_contigs
bin.genes[aln["qname"]].annotations.append(annotation)
~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'k127_4584534_11'
So, this - KeyError: 'k127_4584534_11' is always different for different bin, is just name of contig that tool don't recognize?
And when I try use gc-content, it also will give me error for the same bin, that gave error with the phylo-markers
magpurify gc-content /path/bin.5.fa /path/magpurify_results
Computing mean contig GC content
Traceback (most recent call last):
File "/path/miniconda3/bin/magpurify", line 10, in <module>
sys.exit(cli())
^^^^^
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/cli.py", line 116, in cli
args["func"](args)
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/gc.py", line 68, in main
contig.gc = round(SeqUtils.GC(seq), 2)
^^^^^^^^^^^
AttributeError: module 'Bio.SeqUtils' has no attribute 'GC'
If I run modules tetra-freq, clade-markers and known-contam, it will work fine
magpurify clade-markers /path/bin.5.fa /path/magpurify_results --threads 16
Reading database info
Calling genes with Prodigal
all genes: /path/magpurify_results/clade-markers/genes.[ffn|faa]
Performing pairwise alignment of genes against MetaPhlan2 database of clade-specific genes
alignments: /path/magpurify_results/clade-markers/genes.m8
Finding top hits to database
2118 genes with a database hit
Classifying genes at each taxonomic rank
kingdom: 104 classified genes
phylum: 0 classified genes
class: 0 classified genes
order: 0 classified genes
family: 0 classified genes
genus: 0 classified genes
species: 0 classified genes
Taxonomically classifying contigs
total contigs: 995
kingdom: 94 classified contigs
phylum: 0 classified contigs
class: 0 classified contigs
order: 0 classified contigs
family: 0 classified contigs
genus: 0 classified contigs
species: 0 classified contigs
Taxonomically classifying genome
consensus taxon: None
Identifying taxonomically discordant contigs
0 flagged contigs: /path/magpurify_results/clade-markers/flagged_contigs
Has anyone encountered something like this? Are there any solutions?
I would appreciate any suggestions.
Thanks, Alla
Thanks
What about phylo-markers module error, I tried to look for it :) But haven't found anything. And why it's not working for some but not all bins...
To me that sounds like an error in sequence header formatting. Maybe some contig headers are repeated? Or their sequences are short or missing? Are there N or X characters in such contigs?
Another general advice: if a tool works for some contigs but not the others, see what is different about those contigs.
Thank you
Strange but there are no such headers or contig names in my fasta files
There is absolutely no chance that a contig labeled
k127_4584534
doesn't exist in your fasta files. These names are not made up out of thin air - they are read from the file. If you can't find what's wrong with that contig but instead you simply delete it, it is a safe bet that particular error will go away.I have checked with my eyes, and I didn't find it, I have only one contig that starts with k127_458 .... but the name is different - k127_4587812
Why it should be the name of contig? When the error not specified that it's about bin? The last line -
File "/path/miniconda3/lib/python3.12/site-packages/magpurify/modules/phylo.py", line 372, in flag_contigs bin.genes[aln["qname"]].annotations.append(annotation)
I saw some issues on github, I didn't even open them, because they were not answered ... some of them were 2021 ..
Looks like I need to find another tool then.
By the way, these errors are not unique. If you look through the
magpurify
GitHub issues, both types have been reported already.Signing off.