iu-gen-matching-fastq-files, `--identifier-code` parameter
0
0
Entering edit mode
4.1 years ago
loisveillat ▴ 10

Hello,

I'm trying to make a co assembly with the following order :

megahit -1 $R1s -2 $R2s --min-contig-len $MIN_CONTIG_SIZE -m 0.85 -o 02_ASSEMBLY/ -t $NUM_THREADS.

However, I get the following error :

[ERROR] [sequence_manager.cpp: 151]: File(s) R1.fastq.gz,R2.fastq.gz: Number of sequences not the same in paired files. Abort.

So, i'm trying to use the following command iu-gen-matching-fastq-files in illumina-utils to remove non-matching reads from my FASTQ files

However, I can't find out how to define the --identifier-code parameter.

Here is my first 8 lines from R1 and R2 files :

R1 :

@SRR2132206.3.1 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
ATCAGCTCCGGACATATCTGGCAGGGTGAATTCCACAACAAGCGCAAAGACGGGTCGCTGTTCTGGGAATCTGCCACCATTGCCCCGGTTATCAACGATG
+SRR2132206.3.1 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
<BBBFFFFFFFFFIIIIFIIIIIIFIFFFIIFIIIIIIIIIIIIFIIIFIIBFFBBBBFBBBBBBBBBBBFFFFB<BBBBBBBBBBFF<<<BBFFBBBBB
@SRR2132206.7.1 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
GTCCAGGTCCACGATGTTGTCCAGGGTGGCCCCGAAGCTGATTGCCGTTGCGGCCACGTCGATACGCTTGTCCAGGCTTCCCGGCCCGATGGCCTGGACG
+SRR2132206.7.1 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
BBBFFFFFFFFFFIIIIIFFIIIIFIIIIIIIIIFFIIIIIFFFIIIIIIFFFFFFFFFFFFBFFFFFFFFFFFFFBFBBBBBFFFFFBBBBBBBBFBFF

R2 :

@SRR2132206.3.2 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
GTGCTTGCCAATCACGTCTTTTTCGTTCTCGAAGTTCATCAGCTTCAGAGCAGTGGGATTGACGTAGGTAAAGCGTCCTTGAACATCCGTACGATAGATG
+SRR2132206.3.2 HWI-ST1238:205:C4MWHACXX:7:1101:3192:1997 length=100
BBBFFFFFFFFFFIIIFFFFIIIIIIIIFFFFIIFFIFFFFFIIIIIIFIFFIBFFIBFFFBFFFFFF<BFFFFF<BBFFFFFFFFFFFBFFFFBBB<BB
@SRR2132206.7.2 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
CGCCGTAATCGTTATGAGGGGGTCTTGGGCACCGCCCTGCTCCGGATCAACAACTGGAATGTGGGCCGTACCGGCCTGGGTGAACAGCAGGCGCAGGACC
+SRR2132206.7.2 HWI-ST1238:205:C4MWHACXX:7:1101:4397:1993 length=100
BBBFFFFFFFFFFIIIIIIIIIFFIIIIIIIIIIIIIIFFFFFFF<BFFFBFFFFFFFBBFBFFFFFFFFFFFFFFFFFF<<<BBFFFFBFBBB<BFBBB

Can you help me please ?

Thank

illumina utils anvio metagenomics megahit • 960 views
ADD COMMENT
0
Entering edit mode

If you had dumped these reads with -F parameter you would have recovered original Illumina fastq headers. Can you try repair.sh from BBMap suite. Here is a guide for it.

ADD REPLY
0
Entering edit mode

How did you obtain the file? Checking for SRR2132206 on NCBI yields several entries. Maybe something went wrong when converting from SRA to Fastq. Did you validate that the files have indeed unequal number of reads, e.g. with wc -l?

ADD REPLY
0
Entering edit mode

It works, thank you very much for your help! :)

Sincerely

Loïs Veillat

ADD REPLY
0
Entering edit mode

If you clarify which of the two solutions worked then I can move that to an answer.

ADD REPLY

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6