Question

Error executing process > consensus_classification in NanoCLUST

0

Entering edit mode

2.1 years ago

yeshirata ▴ 20

Hello, I'm trying to run NanoCLUST with my 16S sequence data. I run it on a Linux ubuntu 20.04 machine.

When I use the command: nextflow run main.nf -profile docker --reads "*.fastq" --db "db/16S_ribosomal_RNA" --tax "db/taxdb/"

I get the following terminal output:

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [small_bassi] - revision: fe969e139d

----------------------------------------------------
      _   __                     ________    __  _____________
     / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
    /  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /   
   / /|  / /_/ / / / / /_/ /  / /___/ /___/ /_/ /___/ // /    
  /_/ |_/\__,_/_/ /_/\____/   \____/_____/\____//____//_/     

  NanoCLUST v1.0dev
----------------------------------------------------

Run Name          : small_bassi
Reads             : *.fastq
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : docker - [:]
Output dir        : ./results
Launch dir        : /mnt/work1/yeshirata/NanoCLUST
Working dir       : /mnt/work1/yeshirata/NanoCLUST/work
Script dir        : /mnt/work1/yeshirata/NanoCLUST
User              : yeshirata
Config Profile    : docker
----------------------------------------------------
executor >  local (116)
[43/0ff394] process > QC (1)                        [100%] 1 of 1 ✔
[7f/727c56] process > fastqc (1)                    [100%] 1 of 1 ✔
[da/c79083] process > kmer_freqs (1)                [100%] 1 of 1 ✔
[17/69ee25] process > read_clustering (1)           [100%] 1 of 1 ✔
[68/1cee81] process > split_by_cluster (1)          [100%] 1 of 1 ✔
[1a/fdd8fa] process > read_correction (17)          [100%] 21 of 21 ✔
[3e/5a56e8] process > draft_selection (21)          [100%] 21 of 21 ✔
[fd/1ec834] process > racon_pass (21)               [100%] 21 of 21 ✔
[de/66d3e0] process > medaka_pass (21)              [100%] 21 of 21 ✔
[59/5068f5] process > consensus_classification (21) [ 96%] 25 of 26, failed: 5, retries: 5
[-        ] process > join_results                  -
[-        ] process > get_abundances                -
[-        ] process > plot_abundances               -
executor >  local (116)
[43/0ff394] process > QC (1)                        [100%] 1 of 1 ✔
[7f/727c56] process > fastqc (1)                    [100%] 1 of 1 ✔
[da/c79083] process > kmer_freqs (1)                [100%] 1 of 1 ✔
[17/69ee25] process > read_clustering (1)           [100%] 1 of 1 ✔
[68/1cee81] process > split_by_cluster (1)          [100%] 1 of 1 ✔
[1a/fdd8fa] process > read_correction (17)          [100%] 21 of 21 ✔
[3e/5a56e8] process > draft_selection (21)          [100%] 21 of 21 ✔
[fd/1ec834] process > racon_pass (21)               [100%] 21 of 21 ✔
[de/66d3e0] process > medaka_pass (21)              [100%] 21 of 21 ✔
[93/1ee05d] process > consensus_classification (6)  [100%] 26 of 26, failed: 6, retries: 5 ✘
[-        ] process > join_results                  [  0%] 0 of 1
[-        ] process > get_abundances                -
[-        ] process > plot_abundances               -
[b3/d8f0f5] process > output_documentation          [100%] 1 of 1 ✔
[ee/a97bac] NOTE: Process `consensus_classification (6)` terminated with an error exit status (255) -- Execution is retried (1)[20/4869e7] NOTE: Process `consensus_classification (6)` terminated with an error exit status (255) -- Execution is retried (2)
[e2/8afc0e] NOTE: Process `consensus_classification (6)` terminated with an error exit status (255) -- Execution is retried (3)
[1c/c98be8] NOTE: Process `consensus_classification (6)` terminated with an error exit status (255) -- Execution is retried (4)
[35/369818] NOTE: Process `consensus_classification (6)` terminated with an error exit status (255) -- Execution is retried (5)
Error executing process > 'consensus_classification (6)'

Caused by:
  Process `consensus_classification (6)` terminated with an error exit status (255)

Command executed:

  export BLASTDB=
  export BLASTDB=$BLASTDB:/mnt/work1/yeshirata/NanoCLUST/db/taxdb/
  blastn -query consensus.fasta -db /mnt/work1/yeshirata/NanoCLUST/db/16S_ribosomal_RNA -task blastn -dust no -outfmt "10 sscinames staxids evalue length pident" -evalue 11 -max_hsps 50 -max_target_seqs 5 | sed 's/,/;/g' > consensus_classification.csv
  #DECIDE FINAL CLASSIFFICATION  cat 2_draft.log > 2_blast.log
  echo -n ";" >> 2_blast.log  BLAST_OUT=$(cut -d";" -f1,2,4,5 consensus_classification.csv | head -n1)
  echo $BLAST_OUT >> 2_blast.log

Command exit status:
  255

Command output:
  (empty)

Command error:
  Error: NCBI C++ Exception:
      T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 82: overflow error ( at [].[].gi)
      T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

Work dir:
  /mnt/work1/yeshirata/NanoCLUST/work/93/1ee05dd265f94795f6e15c766ebeb5

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`


WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

Does anyone have had this problem and can help me to solve it?

Thanks a lot in advance

Yeshirata

Nanoclust Nanoporesequence 16SrRNA • 1.2k views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 2.1 years ago by yeshirata ▴ 20

0

Entering edit mode

Hello! Did you find any solution? I have the same issue

ADD REPLY • link 22 months ago by kirillkirilenko ▴ 30

Ram · Answer 1 · 2023-09-18

Hi, I've recently ran into this issue as well and came across this post, so I hope I can still help somebody by posting an answer.

The error has something to do with certain taxids from NCBI not being in the UniPept database and thus giving an empty response that makes the whole pipeline crash. The new code prevents this crash and gives the taxid(numerical) as classification output. If you want to know which classification belongs to the taxid, just search up the taxid in the NCBI database.

I changed the 'get_abundances.py' file in the templates directory of NanoCLUST. I removed this on line 22+:

try:
    name = json.loads(complete_tax)[0][tax_level_tag]
except:
    name = str(int(tax_id))
return json.loads(complete_tax)[0][tax_level_tag]

And I replaced it with these lines:

path = '[http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]='(http://api.unipept.ugent.be/api/v1/taxonomy.json?input%5b%5d=%27) + str(int(tax_id)) + '&extra=true&names=true'
complete_tax = requests.get(path).text
# Check if the list returned by json.loads() is not empty
tax_list = json.loads(complete_tax)
if len(tax_list) > 0:
    name = tax_list[0][tax_level_tag]
else:
    name = str(int(tax_id))
return name

More information can be found in issue #80 that I opened on the NanoCLUST github page.