Question

Error in RGI-CARD database

0

Entering edit mode

3.7 years ago

flo21 • 0

Hi,

I have some metagenomic .fasta files that I'm trying to analyze via CARD database https://github.com/arpcard/rgi which states that .fasta or .fasta.gz as accepted as input sequence.

My lowest paired-end file is 5.6GB. I run the analysis to test it but somewhere during the analysis the computer's space memory was not enough and I think that caused the analysis to be cut (a). As result I do get the 2 output files: .json & .txt but both are empty.

I tried compressing the file with

$ gzip filename

But when using the ##.fasta.gz file the analysis is not even carried out because "its doesn't support the format" (b) I have tried now in both linux and macOS terminal and still getting the same result. Don't have a clue what I'm doing wrong, please, any advice/suggestion would be much appreciate it

Observations from the run: During the analysis with the .fasta file I can see 5 temporal files (##.fasta.temp, ##.fasta.temp.potentrialGenes, ##.fasta.temp.contigToORF.fsa, ##.fasta.temp.contig.fsa, ##.fasta.temp.contig.fsa.blastRes.xml) Some of them are really heavy ~55GB (is that normal?) .

(a).

Error: [blastp] Failed s_BlastXMLAddIteration Q(0/1
Process Process-1:4:
Traceback (most recent call last):
  File "/Users/anaconda3/envs/rgi2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/anaconda3/envs/rgi2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/anaconda3/envs/rgi2/lib/python3.6/site-packages/app/Filter.py", line 116, in process_rrna
    self.format_fasta()
  File "/Users/anaconda3/envs/rgi2/lib/python3.6/site-packages/app/Filter.py", line 160, in format_fasta
    fout.write(">{}\n{}\n".format(header, seq))
OSError: [Errno 28] No space left on device
WARNING 2020-08-15 15:30:56,939 : Exception: <class 'OSError'> -> [Errno 28] No space left on device -> model_type: homolog
WARNING 2020-08-15 15:31:14,101 : Exception: <class 'OSError'> -> [Errno 28] No space left on device -> model_type: overexpression
WARNING 2020-08-15 20:49:47,327 : Exception: <class 'xml.parsers.expat.ExpatError'> -> unclosed token: line -2047941080, column 23 -> model_type: variant

(b).

ERROR 2020-08-14 12:17:04,726 : gz
ERROR 2020-08-14 12:17:04,726 : application/gzip
WARNING 2020-08-14 12:17:04,726 : Sorry, no support for this format.

software error fasta fasta.gz card • 1.7k views

ADD COMMENT • link updated 3.7 years ago by h.mon 35k • written 3.7 years ago by flo21 • 0

0

Entering edit mode

My lowest paired-end file is 5.6GB.

If you file is paired-end and has 5.6Gb, it is probably a fastq (not fasta) with sequencing reads. You don't show the command-line you used, but it seems to me you are trying to run sequencing reads with rgi main, which has --input_type contig or input_type protein. You are then running out of disk space:

WARNING 2020-08-15 15:30:56,939 : Exception: <class 'OSError'> -> [Errno 28] No space left on device -> model_type: homolog
WARNING 2020-08-15 15:31:14,101 : Exception: <class 'OSError'> -> [Errno 28] No space left on device -> model_type: overexpression

Even if you didn't, a blast search with an 5.6Gb input file would take a very, very long time.

You can use fastq files with rgi bwt, which has the following warning:

This is an unpublished algorithm undergoing beta-testing.

ADD REPLY • link 3.7 years ago by h.mon 35k

0

Entering edit mode

Thanks for your input!

I could try the analysis using .fastq files as you recommend . However, since fastq files are heavier I assumed/hadn't much hopes after seeing that disk space with fasta files uncompressed is already a problem.

Yes, this is the command line I'm trying:

rgi main --input_sequence /path/to/nucleotide_input.fasta --output_file /path/to/output_file --input_type contig --local --clean

I originally had myForward_sequence.fastq and myRevervse_sequence.fastq , and I merged and converted into my 5.6 GB fasta Did so as following:

sed -n '1~4s/^@/>/p;2~4p' in.fastq > out.fasta

Merge them:

cat myForward_sequence.fasta myRevervse_sequence.fasta > my.fasta

ADD REPLY • link updated 3.7 years ago by h.mon 35k • written 3.7 years ago by flo21 • 0

score 0 · Answer 1 · 2020-08-17

You have to assemble the genome - Shovill is very fast and light on resources - and use the assembled contigs (then use --input_type contig), or predict the proteins after assembling the genome (then use input_type protein). You can not use the reads - either in fasta or fastq - with rgi main, this is not what it was designed for.

If you want to use the reads without assembling, then you need to use rgi bwt.