Hello,
I'm trying to make a co assembly with the following order :
megahit -1 $R1s -2 $R2s --min-contig-len $MIN_CONTIG_SIZE -m 0.85 -o 02_ASSEMBLY/ -t $NUM_THREADS
.
However, I get the following error :
[ERROR] [sequence_manager.cpp: 151]: File(s) R1.fastq.gz,R2.fastq.gz: Number of sequences not the same in paired files. Abort.
So, i'm trying to use the following command iu-gen-matching-fastq-files
in illumina-utils to remove non-matching reads from my FASTQ files
However, I can't find out how to define the --identifier-code
parameter.
Here is my first 8 lines from R1 and R2 files :
R1 :
@SRR2132206.3.1 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
ATCAGCTCCGGACATATCTGGCAGGGTGAATTCCACAACAAGCGCAAAGACGGGTCGCTGTTCTGGGAATCTGCCACCATTGCCCCGGTTATCAACGATG
+SRR2132206.3.1 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
<BBBFFFFFFFFFIIIIFIIIIIIFIFFFIIFIIIIIIIIIIIIFIIIFIIBFFBBBBFBBBBBBBBBBBFFFFB<BBBBBBBBBBFF<<<BBFFBBBBB
@SRR2132206.7.1 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
GTCCAGGTCCACGATGTTGTCCAGGGTGGCCCCGAAGCTGATTGCCGTTGCGGCCACGTCGATACGCTTGTCCAGGCTTCCCGGCCCGATGGCCTGGACG
+SRR2132206.7.1 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
BBBFFFFFFFFFFIIIIIFFIIIIFIIIIIIIIIFFIIIIIFFFIIIIIIFFFFFFFFFFFFBFFFFFFFFFFFFFBFBBBBBFFFFFBBBBBBBBFBFF
R2 :
@SRR2132206.3.2 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
GTGCTTGCCAATCACGTCTTTTTCGTTCTCGAAGTTCATCAGCTTCAGAGCAGTGGGATTGACGTAGGTAAAGCGTCCTTGAACATCCGTACGATAGATG
+SRR2132206.3.2 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
BBBFFFFFFFFFFIIIFFFFIIIIIIIIFFFFIIFFIFFFFFIIIIIIFIFFIBFFIBFFFBFFFFFF<BFFFFF<BBFFFFFFFFFFFBFFFFBBB<BB
@SRR2132206.7.2 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
CGCCGTAATCGTTATGAGGGGGTCTTGGGCACCGCCCTGCTCCGGATCAACAACTGGAATGTGGGCCGTACCGGCCTGGGTGAACAGCAGGCGCAGGACC
+SRR2132206.7.2 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
BBBFFFFFFFFFFIIIIIIIIIFFIIIIIIIIIIIIIIFFFFFFF<BFFFBFFFFFFFBBFBFFFFFFFFFFFFFFFFFF<<<BBFFFFBFBBB<BFBBB
Can you help me please ?
Thank
If you had dumped these reads with
-F
parameter you would have recovered original Illumina fastq headers. Can you tryrepair.sh
from BBMap suite. Here is a guide for it.How did you obtain the file? Checking for
SRR2132206
on NCBI yields several entries. Maybe something went wrong when converting from SRA to Fastq. Did you validate that the files have indeed unequal number of reads, e.g. withwc -l
?It works, thank you very much for your help! :)
Sincerely
Loïs Veillat
If you clarify which of the two solutions worked then I can move that to an answer.