Question: FASTQ header editing
0
gravatar for felipe.zoujiro
3.5 years ago by
felipe.zoujiro20 wrote:

Hi everyone,

This is my first post here, sorry if this issue is out of place but I am really new in bioinformatics and scripting. I am working in de novo genome assemblies with some marine invertebrates, and as part of my pipeline I have error-corrected some FASTQ files using Rcorrector. This software added the following information to the FASTQ headers

In the header line for each read, Rcorrector will append some information.

"cor": some bases of the sequence are corrected "unfixable_error": the errors could not be corrected "l:INT m:INT h:INT": the lowest, median and highest kmer count of the kmers from the read

So, I have some FASTQ files with the following headers:

@HWI-ST169:272:C0RCGACXX:1:1306:13471:25027 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
?@@FF<D@6DFDDIGFHFB@B?@BBB6BBBBDBDDD

@HWI-ST169:272:C0RCGACXX:1:2107:18438:124552 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
=@;::)<AFD)0?AFFFDB6;?B637:BBBB6BBBB

@HWI-ST169:272:C0RCGACXX:1:1204:15681:165032 1:N:0: l:185516 m:185516 h:185516 unfixable_error
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

This is just a snapshot of the headers but other reads have the information that I mentioned above. So, I am wondering how I can remove the additional information included by Rcorrector. For example, l:185516 m:185516 h:185516 unfixable_error

I want to use Meraculous for de novo genome assembly but I am getting the error that FASTQ header is not valid. I guess this error is related to this additional information in the FASTQ headers.

Hope someone here can help me.

Thanks in advance,

Felipe

sequence • 1.3k views
ADD COMMENTlink modified 3.5 years ago by Pierre Lindenbaum129k • written 3.5 years ago by felipe.zoujiro20
2
gravatar for Pierre Lindenbaum
3.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:
cut -d ' ' -f1,2 in.fastq > out.fastq
ADD COMMENTlink written 3.5 years ago by Pierre Lindenbaum129k

Thanks Pierre, your command works so smoothy. Now I can see if Meraculous accepts the new FASTQ headers of my files. Great!!!

ADD REPLYlink written 3.5 years ago by felipe.zoujiro20

you can now validate my answer (green mark on the left) to close this question.

ADD REPLYlink written 3.5 years ago by Pierre Lindenbaum129k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour