Hi, everybody. I have FASTQ headers of the form
The "ATCACGATC" portion of these older-style headers is supposed to be the "index sequence", or the molecular barcode of a multiplexed sample, according to the Wikipedia article. But I know what the barcodes are, and that isn't one of them. All the barcodes for this project are 6-8bp, not 9, and there are "only" about 300 legitimate barcodes in the lane.
Overall, there are over 255,000 different individual ones of these index sequences on various different reads, out of about 178M reads in the file. Some of them even contain Ns, but they're all exactly 9bp long. And this particular sequence (ATCACGATC) is by far the most prevalent -- it's on 90% of the reads, so it can't be succeeding at separating anything out very specifically.
I'm coming in late to a project, and all I have to go on is the FASTQ files, the list of barcodes, and some wet-lab protocol docs that I'm not particularly qualified to interpret. Any idea what this odd extraneous-looking sequence is? If so, thanks in advance!