Question: iu-gen-matching-fastq-files, `--identifier-code` parameter
0
gravatar for loisveillat
7 months ago by
loisveillat10
loisveillat10 wrote:

Hello,

I'm trying to make a co assembly with the following order :

megahit -1 $R1s -2 $R2s --min-contig-len $MIN_CONTIG_SIZE -m 0.85 -o 02_ASSEMBLY/ -t $NUM_THREADS.

However, I get the following error :

[ERROR] [sequence_manager.cpp: 151]: File(s) R1.fastq.gz,R2.fastq.gz: Number of sequences not the same in paired files. Abort.

So, i'm trying to use the following command iu-gen-matching-fastq-files in illumina-utils to remove non-matching reads from my FASTQ files

However, I can't find out how to define the --identifier-code parameter.

Here is my first 8 lines from R1 and R2 files :

R1 :

@SRR2132206.3.1 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
ATCAGCTCCGGACATATCTGGCAGGGTGAATTCCACAACAAGCGCAAAGACGGGTCGCTGTTCTGGGAATCTGCCACCATTGCCCCGGTTATCAACGATG
+SRR2132206.3.1 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
<BBBFFFFFFFFFIIIIFIIIIIIFIFFFIIFIIIIIIIIIIIIFIIIFIIBFFBBBBFBBBBBBBBBBBFFFFB<BBBBBBBBBBFF<<<BBFFBBBBB
@SRR2132206.7.1 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
GTCCAGGTCCACGATGTTGTCCAGGGTGGCCCCGAAGCTGATTGCCGTTGCGGCCACGTCGATACGCTTGTCCAGGCTTCCCGGCCCGATGGCCTGGACG
+SRR2132206.7.1 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
BBBFFFFFFFFFFIIIIIFFIIIIFIIIIIIIIIFFIIIIIFFFIIIIIIFFFFFFFFFFFFBFFFFFFFFFFFFFBFBBBBBFFFFFBBBBBBBBFBFF

R2 :

@SRR2132206.3.2 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
GTGCTTGCCAATCACGTCTTTTTCGTTCTCGAAGTTCATCAGCTTCAGAGCAGTGGGATTGACGTAGGTAAAGCGTCCTTGAACATCCGTACGATAGATG
+SRR2132206.3.2 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
BBBFFFFFFFFFFIIIFFFFIIIIIIIIFFFFIIFFIFFFFFIIIIIIFIFFIBFFIBFFFBFFFFFF<BFFFFF<BBFFFFFFFFFFFBFFFFBBB<BB
@SRR2132206.7.2 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
CGCCGTAATCGTTATGAGGGGGTCTTGGGCACCGCCCTGCTCCGGATCAACAACTGGAATGTGGGCCGTACCGGCCTGGGTGAACAGCAGGCGCAGGACC
+SRR2132206.7.2 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
BBBFFFFFFFFFFIIIIIIIIIFFIIIIIIIIIIIIIIFFFFFFF<BFFFBFFFFFFFBBFBFFFFFFFFFFFFFFFFFF<<<BBFFFFBFBBB<BFBBB

Can you help me please ?

Thank

ADD COMMENTlink modified 7 months ago • written 7 months ago by loisveillat10

If you had dumped these reads with -F parameter you would have recovered original Illumina fastq headers. Can you try repair.sh from BBMap suite. Here is a guide for it.

ADD REPLYlink written 7 months ago by genomax91k

How did you obtain the file? Checking for SRR2132206 on NCBI yields several entries. Maybe something went wrong when converting from SRA to Fastq. Did you validate that the files have indeed unequal number of reads, e.g. with wc -l?

ADD REPLYlink written 7 months ago by ATpoint40k

It works, thank you very much for your help! :)

Sincerely

Loïs Veillat

ADD REPLYlink written 7 months ago by loisveillat10

If you clarify which of the two solutions worked then I can move that to an answer.

ADD REPLYlink written 7 months ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 849 users visited in the last hour