Question

FASTQ header editing

0

Entering edit mode

7.2 years ago

felipe.zoujiro ▴ 30

Hi everyone,

This is my first post here, sorry if this issue is out of place but I am really new in bioinformatics and scripting. I am working in de novo genome assemblies with some marine invertebrates, and as part of my pipeline I have error-corrected some FASTQ files using Rcorrector. This software added the following information to the FASTQ headers

In the header line for each read, Rcorrector will append some information.

"cor": some bases of the sequence are corrected "unfixable_error": the errors could not be corrected "l:INT m:INT h:INT": the lowest, median and highest kmer count of the kmers from the read

So, I have some FASTQ files with the following headers:

@HWI-ST169:272:C0RCGACXX:1:1306:13471:25027 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
?@@FF<D@6DFDDIGFHFB@B?@BBB6BBBBDBDDD

@HWI-ST169:272:C0RCGACXX:1:2107:18438:124552 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
=@;::)<AFD)0?AFFFDB6;?B637:BBBB6BBBB

@HWI-ST169:272:C0RCGACXX:1:1204:15681:165032 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

This is just a snapshot of the headers but other reads have the information that I mentioned above. So, I am wondering how I can remove the additional information included by Rcorrector. For example, l:185516 m:185516 h:185516 unfixable_error

I want to use Meraculous for de novo genome assembly but I am getting the error that FASTQ header is not valid. I guess this error is related to this additional information in the FASTQ headers.

Hope someone here can help me.

Thanks in advance,

Felipe

sequence • 2.5k views

ADD COMMENT • link updated 7.2 years ago by Pierre Lindenbaum 161k • written 7.2 years ago by felipe.zoujiro ▴ 30

score 2 · Accepted Answer · 2017-02-01

2

Entering edit mode

7.2 years ago

Pierre Lindenbaum 161k

cut -d ' ' -f1,2 in.fastq > out.fastq

ADD COMMENT • link 7.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks Pierre, your command works so smoothy. Now I can see if Meraculous accepts the new FASTQ headers of my files. Great!!!