Question

assembly of fastq_pair results (4 files )

0

Entering edit mode

4.0 years ago

Bioinfo ▴ 20

Hello everyone

I have two files containing reads that I want to assembly

I run spades and it shows this error message

Error log:

  == Running assembler: K27

   0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from
   /data/friesen/testdrive/agro-dir/spades_assembly/assembly/K27/configs/config.info
   0:00:00.000 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit
   set to 250 Gb 0:00:00.000 4M / 4M INFO General (main.cpp : 87)
   Starting SPAdes, built from refs/heads/spades_3.13.0, git revision
   8ea46659e9b2aca35444a808db550ac333006f8b 0:00:00.000 4M / 4M INFO
   General (main.cpp : 88) Maximum k-mer length: 128 0:00:00.000 4M / 4M
   INFO General (main.cpp : 89) Assembling dataset
   (/data/friesen/testdrive/agro-dir/spades_assembly/assembly/dataset.info)
   with K=27 0:00:00.000 4M / 4M INFO General (main.cpp : 90) Maximum #
   of threads to use (adjusted due to OMP capabilities): 1 0:00:00.000
   4M / 4M INFO General (launch.hpp : 51) SPAdes started 0:00:00.000 4M
   / 4M INFO General (launch.hpp : 58) Starting from stage: construction
   0:00:00.000 4M / 4M INFO General (launch.hpp : 65) Two-step RR
   enabled: 0 0:00:00.000 4M / 4M INFO StageManager (stage.cpp : 132)
   STAGE == de Bruijn graph construction 0:00:00.008 4M / 4M INFO
   General (read_converter.hpp : 77) Converting reads to binary format
   for library #0 (takes a while) 0:00:00.008 4M / 4M INFO General
   (read_converter.hpp : 78) Converting paired reads 0:00:00.401 80M /
   132M INFO General (binary_converter.hpp : 93) 16384 reads processed
   0:00:00.606 92M / 132M INFO General (binary_converter.hpp : 93) 32768
   reads processed 0:00:01.021 120M / 132M INFO General
   (binary_converter.hpp : 93) 65536 reads processed 0:00:02.071 184M /
   184M INFO General (binary_converter.hpp : 93) 131072 reads processed
   0:00:04.251 320M / 320M INFO General (binary_converter.hpp : 93)
   262144 reads processed 0:00:06.537 464M / 464M ERROR General
   (paired_readers.hpp : 56) The number of right read-pairs is larger
   than the number of left read-pairs 0:00:06.537 464M / 464M ERROR
   General (paired_readers.hpp : 60) Unequal number of read-pairs
   detected in the following files:
   /data/friesen/testdrive/agro-dir/spades_assem
   bly/corrected_1.fastq.gz
   /data/friesen/testdrive/agro-dir/spades_assembly/corrected_2.fastq.gz

   == Error == system call for: "['/home/richard.white3/SPAdes-3.13.0-Linux/bin/spades-core',
   '/data/friesen/testdrive/agro-dir/spades_assembly/assembly/K27/configs/config.info']"
   finished abnormally, err c ode: 255

   In case you have troubles running SPAdes, you can write to
   spades.support@cab.spbu.ru or report an issue on our GitHub
   repository github.com/ablab/spades Please provide us with params.txt
   and spades.log files from the output directory. I have checked the
   inputs prior to error correction and it had the same number of reads.

   I tried to check the corrected reads but it formats them weird so I
   can 't check with fastqc.

   uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline
   'AAAAAAAADDDDDDDDGGGGGFHIHFHHHHHHHHHHHHIHHHHHHHIIHHHHHHHHHHHHHHHFGGEGEDEGEGCEGGEDGGGG?DG?GGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGDGGGGGGGGGGGGGGGGGGGEGGGAGDGGEGGGGGGGGGGGGGGGGGAGGEGGGGGCEGGGGGGEEGGGGGGGGGGGGGGDGGGEEGGGGGGGGGGGGGGGGDA>DGGGGGAGGGGGDGGGG'
   didn't start with '+' at
   uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:172)
   at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
   at
   uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
   at java.lang.Thread.run(Thread.java:748)

   Failed to process file corrected_2.fastq.gz
   uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line
   didn't start with '@' at
   uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
   at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
   at
   uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
   at java.lang.Thread.run(Thread.java:748)

i used fastq_pair to see reads that have a mate and to separate out singletons. i obtained four files :

left.fastq.paired.fq
left.fastq.single.fq
right.fastq.paired.fq
right.fastq.single.fq

I have to mention that the number of reads in the single files is very much higher than the number of reads in paired files

Left paired: 37027
Right paired: 37027
Left single: 1512745
Right single: 1509165

My question is how can I use the four file for the assembly or it's okay to use the paired files

Thank you

Assembly alignment software error sequence • 959 views

ADD COMMENT • link updated 4.0 years ago by Ram 43k • written 4.0 years ago by Bioinfo ▴ 20

0

Entering edit mode

ID line didn't start with '@' and other errors

You appear to have corrupted the data files so they no longer are in FastQ format.

The number of right read-pairs is larger than the number of left read-pairs

You also may have trimmed these files independently (not advisable) so the number and order of read pairs in your files no longer match.

Best option would likely be to start over with raw data and re-do your trimming (in proper pairs).

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Thank you very much for your message , please i extracted unmapped reads frow data and i don't exactly in which step i had the error

first i mapped my reads on the eference using bowtie2 after that i transformed sam file to bam file using samtools and i extracted unmapped reads using samtools view -f 4 -h finally i used this commande to obatain forward an dreverse unmapped reads

samtools fastq -1 file_1_Unmapped_reads.fastq -2 file_2_Unmapped_reads.fastq file_Unmapped_reads.bam

ADD REPLY • link 4.0 years ago by Bioinfo ▴ 20

0

Entering edit mode

While those steps look ok where the corruption occurred is hard to say. It seems to be there though since programs are complaining.

ADD REPLY • link 4.0 years ago by GenoMax 141k