Question: Problem with "split_and_run_sparc.sh" from DBG2OLC pipeline
0
gravatar for Josué Barrera
3.1 years ago by
Mexico
Josué Barrera10 wrote:

Hi everybody!

I'm having a problem in the consensus stage of the DBG2OLC pipeline. I'm using the script "split_and_run_sparc.sh" to obtain the "final_assembly.fasta" file from my backbone file (backbone_raw.fasta) and my reads (ctg_pb.fasta). I ran the script using the following command:

sh ./split_and_run_sparc.sh backbone_raw.fasta DBG2OLC_Consensus_info.txt ctg_pb.fasta /tmp/consensus_dir 2 >cns_log.txt

While running the script, an error messages appeared:

Traceback (most recent call last): File "./split_reads_by_backbone.py", line 131, in <module> File "./split_reads_by_backbone.py", line 122, in main IOError: [Errno 24] Too many open files: '/tmp/consensus_dir/backbone-1627.reads.fasta'

After the analysis, I observed some inconsistencies between the "backbone_raw.fasta" file and the "final_assembly.fasta" file:

---------------- Information for assembly 'backbone_raw.fasta' ----------------

                                       Number of contigs       1906
                          Number of contigs in scaffolds          0
                      Number of contigs not in scaffolds       1906
                                   Total size of contigs  252974640
                                          Longest contig    2502428
                                         Shortest contig       4957
                               Number of contigs > 1K nt       1906 100.0%
                              Number of contigs > 10K nt       1872  98.2%
                             Number of contigs > 100K nt        512  26.9%
                               Number of contigs > 1M nt         31   1.6%
                              Number of contigs > 10M nt          0   0.0%
                                        Mean contig size     132725
                                      Median contig size      35400
                                       N50 contig length     449759
                                        L50 contig count        147

---------------- Information for assembly 'final_assembly.fasta' ----------------

                                       Number of contigs       1020
                          Number of contigs in scaffolds          0
                      Number of contigs not in scaffolds       1020
                                   Total size of contigs  223116219
                                          Longest contig    2502428
                                         Shortest contig         83
                               Number of contigs > 1K nt       1018  99.8%
                              Number of contigs > 10K nt       1009  98.9%
                             Number of contigs > 100K nt        470  46.1%
                               Number of contigs > 1M nt         31   3.0%
                              Number of contigs > 10M nt          0   0.0%
                                        Mean contig size     218741
                                      Median contig size      82745
                                       N50 contig length     548456
                                        L50 contig count        117

The main inconsistencies between both files is that:

  • The number of contigs almost halved
  • The total size of the assembled genome is reduced (since I have 886 less contigs)
  • Some contigs became smaller (as observed in the "Shortest contig" section)
  • N50, mean and median contig sizes inflated (as a by-product of losing contigs)

Does anyone know if the inconsistencies observed between both files is determined by the error message that appeared while the script was running? Or is this the normal output one should expect after running the consensus stage of the pipeline?

P.D.: I could not run the command "ulimit -n unlimited" before running the script, since I don't have root privileges in the cluster I'm working on. Not sure if this explains the inconsistencies or the error message.

hybrid correction assembly genome • 1.4k views
ADD COMMENTlink modified 3.1 years ago by colindaven1.8k • written 3.1 years ago by Josué Barrera10
1
gravatar for colindaven
3.1 years ago by
colindaven1.8k
Hannover Medical School
colindaven1.8k wrote:

I had a problem with this stage too. I never got a final assembly out but was stuck at the "backbone_raw.fa" stage.

I did have root access and tried repeatedly to set the ulimit, but it didn't work well and there is only so many times you can restart servers in a cluster without starting to annoy people.

I got a reasonable final assembly out using Racon https://github.com/isovic/racon in the end.

ADD COMMENTlink written 3.1 years ago by colindaven1.8k

I'll try it out.

Thank you very much!

ADD REPLYlink written 3.1 years ago by Josué Barrera10

I am also having issues with the consensus stage of dbg2olc, but in my case the "final_assembly.fasta" that is generated is empty, even though there is no error message.

So I would like to try your suggestion and run Racon with the "backbone_raw.fasta" assembly from dbg2olc. However, I don't know which file to use as the "overlap/alignment" input file, which is necessary for Racon ("Racon takes as input only three files: contigs in FASTA/FASTQ format, reads in FASTA/FASTQ format and overlaps/alignments between the reads and the contigs in MHAP/PAF/SAM format"). The manual of dbg2olc is not very clear, and I'm not sure if such a file is actually generated during the assembly. Would you remember which file you used in your case or if you have to generate an overlap/alignment file with a different software?

ADD REPLYlink modified 8 months ago • written 8 months ago by mths_b40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2153 users visited in the last hour