Illumina paired end fastq sequence identifiers and index primers
Entering edit mode
5 weeks ago
wormball • 0


I have some paired end illumina fastq files. In most of these the sequence identifiers are like this:

@GENOTEK:000:311CE525F:3:1101:17996:1000 1:N:0:TCTTCACA+ATTACTCG
@GENOTEK:000:311CE525F:3:1101:21938:1000 1:N:0:TCTTCACA+ATTACTCG
@GENOTEK:000:311CE525F:3:1101:1208:1016 1:N:0:TCTTCACA+ATTACTCG
@GENOTEK:000:311CE525F:3:1101:3558:1016 1:N:0:TCTTCACA+ATTACTCG

So as i can understand TCTTCACA+ATTACTCG constitutes first and second index primers which are attached to the fragment to differentiate one end from another.

But at least one pair of files has identifiers like this:

@GENOTEK:000:9589D2457:7:1101:12895:1362 1:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:16011:1379 1:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:17381:1432 1:N:0:NTTACTCG

@GENOTEK:000:9589D2457:7:1101:12895:1362 2:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:16011:1379 2:N:0:NTTACTCG
@GENOTEK:000:9589D2457:7:1101:17381:1432 2:N:0:NTTACTCG

So it contains only one index primer, and moreover, it is equal at both ends. Does it mean it is impossible to distinguish one end of the fragment from another, so these are effectively single end reads?

And also all the files have run number 000. Is it the thing to worry about?

Thanks in advance.

fastq primers identifiers Illumina • 139 views
Entering edit mode
5 weeks ago
GenoMax 104k

@GENOTEK:000:9589D2457:7:1101:12895:1362 1:N:0:NTTACTCG <--- This set of data is using a single index.

@GENOTEK:000:311CE525F:3:1101:3558:1016 1:N:0:TCTTCACA+ATTACTCG <-- This dataset is using two indexes

In Illumina sequencing index reads are never part of actual sequence and are read independently. This has nothing to do with distinguishing one end of fragment from another. If you have paired-end sequencing data then you are sampling each fragment from both ends. If you have single end sequencing data then the fragment is sampled from only one end. In both cases you can have a single index or two indexes. Indexes are simply being used to label samples to allow bioinformatic read separation after the run.

And also all the files have run number 000. Is it the thing to worry about?

That should not be a cause of worry. My assumption is that the name may have been changed afterwards.


Login before adding your answer.

Traffic: 1868 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6