The problem that you describe is commonly caused by the format of the Diamond output file not matching what Blobtools expects for taxonomic assignment. Blobtools requires a hits file in tab-separated format with exactly three columns: the sequence identifier (from your assembly), the NCBI taxonomic identifier, and the alignment score (typically the bitscore). If the second column does not contain the taxonomic identifier, Blobtools will report no taxonomic hits, even if Diamond produced alignments with taxonomic information elsewhere in the output.
To resolve this, first ensure that your Diamond database was built with taxonomic mappings. Download the necessary NCBI taxonomy files:
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzf taxdump.tar.gz nodes.dmp names.dmp
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
Then, build the database (using UniRef90 as an example):
diamond makedb --in uniref90.faa --db uniref90 --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp
Run Diamond with a custom output format to include the taxonomic identifiers:
diamond blastx --db uniref90 --query your_assembly.fasta --out matches.tsv --outfmt 6 qseqid staxids bitscore
This produces the required three-column format. When creating the BlobDB, specify the path to the taxonomy dump:
blobtools create --infile your_assembly.fasta --out your_blobdb --taxrule bestsum --taxdump path/to/taxdump --hits matches.tsv --cov your_bam.cov
If your current Diamond output includes taxonomic identifiers but not in the second column, reformat it using awk:
awk '{print $1 "\t" $13 "\t" $12}' your_diamond_output.tsv > reformatted_hits.tsv
(Adjust column numbers based on your output format; $13 is typically staxids if using outfmt 6 with staxids, and $12 is bitscore.)
This should allow Blobtools to assign taxonomy correctly.
Kevin
Why do you think you have contaminant sequences in your assembly? Generally when libraries are made from an organismm there should be no extraneous sequence present.
Not true, depending on how the wet-lab experiments were conducted.