This is my first post here, sorry if this issue is out of place but I am really new in bioinformatics and scripting. I am working in de novo genome assemblies with some marine invertebrates, and as part of my pipeline I have error-corrected some FASTQ files using Rcorrector. This software added the following information to the FASTQ headers
In the header line for each read, Rcorrector will append some information.
"cor": some bases of the sequence are corrected "unfixable_error": the errors could not be corrected "l:INT m:INT h:INT": the lowest, median and highest kmer count of the kmers from the read
So, I have some FASTQ files with the following headers:
@HWI-ST169:272:C0RCGACXX:1:1306:13471:25027 1:N:0: l:185516 m:185516 h:185516 unfixable_error AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + ?@@FF<D@6DFDDIGFHFB@B?@BBB6BBBBDBDDD @HWI-ST169:272:C0RCGACXX:1:2107:18438:124552 1:N:0: l:185516 m:185516 h:185516 unfixable_error AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + =@;::)<AFD)0?AFFFDB6;?B637:BBBB6BBBB @HWI-ST169:272:C0RCGACXX:1:1204:15681:165032 1:N:0: l:185516 m:185516 h:185516 unfixable_error AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
This is just a snapshot of the headers but other reads have the information that I mentioned above. So, I am wondering how I can remove the additional information included by Rcorrector. For example, l:185516 m:185516 h:185516 unfixable_error
I want to use Meraculous for de novo genome assembly but I am getting the error that FASTQ header is not valid. I guess this error is related to this additional information in the FASTQ headers.
Hope someone here can help me.
Thanks in advance,