Question

Editing FASTQ headers

0

Entering edit mode

23 months ago

angelica.jara • 0

Hi! I have several fastq files (paired sequencing, si I have R1 and R2 files) with different headers. For example:

@SRR8834012.1.1 MG00HS20:989:CAKP3ANXX:1:1101:1962:1989 length=101
NGGTCCTCGGCAGGCCGAGACCGGCTTCTGCGATCAAGCTGCGCTGAACCTCGCTGCTCCCGGCGTAGATCGTCGCGGCGATGTTTCGGCGCGACATCCAC

Or

@MG00HS20:989:CAKP3ANXX:1:1101:1457:56177/1
TTCCGGATCCCGTCGCGCTGATCGCGGCCTTTTCGCGTCGAGATTGCACGAATGCCGCGTAGGTTTCGCGGTGACCGAGGCC
+
AAABCGC<<C1/@C99/EGEGGD=CGGGG/:FDEEGGGGDEGGGGGGGGGG/09FGG/C:CGG=FG0:FAGG.@DEG.?E.@

I would like to edit the headers on all my FASTQ files so that the header would be the name of the file with a number to the end that would act a sequence counter. So the headers for the reads in a file named test would be test_1, test_2, test_3, and so on.

Could someone help me?

fastq • 1.8k views

ADD COMMENT • link updated 23 months ago by iraun 6.2k • written 23 months ago by angelica.jara • 0

0

Entering edit mode

You should look into bioawk (https://github.com/lh3/bioawk) - I think it will allow you to access these variables (file name, record number i.e. ordinal number of each read) in a streamlined way.

ADD REPLY • link 23 months ago by Ram 43k

score 1 · Answer 1 · 2022-05-18

1

Entering edit mode

23 months ago

rpolicastro 13k

Here's a seqkit replace answer also that will loop over all of the fastq files in a directory.

find . -name "*.fastq" -exec sh -c \
  'seqkit replace -p .+ -r "$(basename $0 .fastq)_{nr}" $0 > ${0%.fastq}.renamed.fastq' {} \;

ADD COMMENT • link 23 months ago by rpolicastro 13k

score 0 · Answer 2 · 2022-05-18

0

Entering edit mode

23 months ago

iraun 6.2k

Hi! Please consider reformatting your question to be more readable.

Have you seen Quick One Liner For Fastq Header Renaming post? I think you can very much adapt the awk solution suggested there, and slightly modify it so that the name of your file is used. Something like this (not tested):

cat input_name.fastq | awk -v fqname="input_name"  '{print (NR%4 == 1) ? "@"fqname"_" ++i : $0}' |  > renamed_header.fastq

ADD COMMENT • link 23 months ago by iraun 6.2k

1

Entering edit mode

I think cat and fqname are not necessary here. Following code should be fine:

$ awk '{print (NR%4 == 1) ? "@"FILENAME"_" ++i : $0}' test.fq

ADD REPLY • link 23 months ago by cpad0112 21k

0

Entering edit mode

Yes of course the code can be shortened :).

ADD REPLY • link 23 months ago by iraun 6.2k

0

Entering edit mode

I had not seen that post! Thanks for sharing, I'll try it.

ADD REPLY • link 23 months ago by angelica.jara • 0

score 0 · Answer 3 · 2022-05-18

0

Entering edit mode

23 months ago

cpad0112 21k

$ bioawk -c fastx '{ print "@"FILENAME"_"++c, $seq,"+",$qual }' test.fq | tr -s "\t" "\n"

ADD COMMENT • link 23 months ago by cpad0112 21k