Hopefully this hasn't been answered before; I tried googling and did not see the answer, but I apologize if this is a duplication. Essentially, I am interested in a script to replace a part of the header of a fastq file with the contents of a different fastq file. Further details below.
My sequence center has supplied me reads from a lane; these reads were demultiplex according to the read's i7 index, which is present in the header of the forward and reverse reads. I also have an i5 index, which was read separately into a corresponding file per read, since it differs per read and is made up of degenerate bases (it is used to detect clones in identically sized reads); this is the way the sequencing center has supplied me data and I would rather not ask them to do things differently on their end as they are already accommodating a few special requests for our project. As such, I have received three fastq files per lane:
Read 1 (Paired read), example (quality scores trimmed for space):
CGTCTACGTATCGGTGCGTACGGCGGCACATCCATTATGATTCGCTTTAATCTCGGTGGTGT TCGGGCCAGCGGCTTTTTGAAAATGCAGCGGTTTGTGATCCGTTTCCTGTACTACAGGTCG TCCGAATTTGTAAGAATGAAACTGTGGA
Read 2 (I5 index read for the paired reads)
Read 3 (Paired read)
TATCGTTGTAGAGTTGCCTTGAATTTACTCCTGAGACCAAACATTCTCCTGCGGCTATAAACT AAGACCAGAAAACCAGAAAGAAGCCCAACCCGCTACGCCCGCACAGCGAGTAATCAAACA AAAATAAGATGAGTCGTAAACAGCTCCA
My current workflow concatenates the I5 to the read and then later trims it after using the concatenated read for the purposes of decloning; however, a superior solution (in my mind) would be to replace the I7 read in the header of both Read 1 and Read 3 with the I5 read from inside Read 2. I have a few reasons for this, from the software I use being written to expect clone filtering from an index position, to still being able to declone reads if I lose a pair during other quality filtering.
As such, I was curious if anyone knows of an existing script that would enable me to replace the I7 index with my I5 in my 'read 2' file. A minor added complication is that the I7 index occasionally begins with a low quality base call (N) instead of A.
EDIT: Included information that I5 is a set of degenerate bases as per comments below (thanks genomax!)