Question: Replacing Illumina index in header of reads
0
gravatar for john.soghigian
19 months ago by
john.soghigian0 wrote:

Hello all,

Hopefully this hasn't been answered before; I tried googling and did not see the answer, but I apologize if this is a duplication. Essentially, I am interested in a script to replace a part of the header of a fastq file with the contents of a different fastq file. Further details below.

My sequence center has supplied me reads from a lane; these reads were demultiplex according to the read's i7 index, which is present in the header of the forward and reverse reads. I also have an i5 index, which was read separately into a corresponding file per read, since it differs per read and is made up of degenerate bases (it is used to detect clones in identically sized reads); this is the way the sequencing center has supplied me data and I would rather not ask them to do things differently on their end as they are already accommodating a few special requests for our project. As such, I have received three fastq files per lane:

Read 1 (Paired read), example (quality scores trimmed for space):

@K00162:210:HLN3HBBXX:2:1101:1326:1332 1:N:0:AGGAACCT
CGTCTACGTATCGGTGCGTACGGCGGCACATCCATTATGATTCGCTTTAATCTCGGTGGTGT TCGGGCCAGCGGCTTTTTGAAAATGCAGCGGTTTGTGATCCGTTTCCTGTACTACAGGTCG TCCGAATTTGTAAGAATGAAACTGTGGA

Read 2 (I5 index read for the paired reads)

@K00162:210:HLN3HBBXX:2:1101:1326:1332 2:N:0:AGGAACCT

CTTTAATTG

Read 3 (Paired read)

@K00162:210:HLN3HBBXX:2:1101:1326:1332 3:N:0:AGGAACCT
TATCGTTGTAGAGTTGCCTTGAATTTACTCCTGAGACCAAACATTCTCCTGCGGCTATAAACT AAGACCAGAAAACCAGAAAGAAGCCCAACCCGCTACGCCCGCACAGCGAGTAATCAAACA AAAATAAGATGAGTCGTAAACAGCTCCA

My current workflow concatenates the I5 to the read and then later trims it after using the concatenated read for the purposes of decloning; however, a superior solution (in my mind) would be to replace the I7 read in the header of both Read 1 and Read 3 with the I5 read from inside Read 2. I have a few reasons for this, from the software I use being written to expect clone filtering from an index position, to still being able to declone reads if I lose a pair during other quality filtering.

As such, I was curious if anyone knows of an existing script that would enable me to replace the I7 index with my I5 in my 'read 2' file. A minor added complication is that the I7 index occasionally begins with a low quality base call (N) instead of A.

EDIT: Included information that I5 is a set of degenerate bases as per comments below (thanks genomax!)

ADD COMMENTlink modified 19 months ago • written 19 months ago by john.soghigian0

I have to ask "why" do this sort of thing?

In spite of what you feel it would be heck lot easier for your sequencing folks to re-run bcl2fastq and generate the right demultiplexed reads.

ADD REPLYlink written 19 months ago by genomax70k

I may be wrong, but I think it has to do with the other samples multiplexed on the lane unrelated to my project, as well as the fact my I5 reads contain degenerate bases, so demultiplexing off of it would create an enormous number of files (when I just want to know what the index reads as, not demultiplex by it). I'm not familiar with if there is a setting in bcl2fastq that would enable them to record the I5 index and the I7 and only demultiplex by the I7.

edit: and I believe one of my libraries is old enough that the bcl may no longer be available.

ADD REPLYlink modified 19 months ago • written 19 months ago by john.soghigian0

the fact my I5 reads contain degenerate bases

That is new and relevant information. I agree that depending on how many unique reads you expect demultiplexing with those can get unwieldy.

a superior solution (in my mind) would be to replace the I7 read in the header of both Read 1 and Read 3 with the I5 read from inside Read 2

This is going to require some sort of custom script unless one of the awk gurus/@Pierre have a solution they can think of.

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour