Illumina summary.csv file units
1
0
Entering edit mode
2.5 years ago
Kevin ▴ 70

We are using illumina Novaseq machines to do some sequencing and one of the files produced is a summary.csv file that has some info about the run that was sequenced. However, it is tough to figure out what units were used for a lot of the data and I can't find anything on illumina's site.

I have found a couple of similar reports on their site like these: 1 and 2 but they don't look like they have all of the fields in the summary.csv file that I have.

Edit: Can't post a well-formatted version of the summary.csv file on here for some reason so here is a pastebin link.

Edit2: To add some clarification. This file is created before demultiplexing (which is done using bcl2fastq). I believe it is created through illumina InterOp.

illumina • 1.4k views
ADD COMMENT
0
Entering edit mode

Can you clarify which summary.csv file are you referring to? Is it only for the demultiplexing or for downstream analysis and alignment (that could be done in BaseSpace)? Currently there are two programs (bcl2fastq and bcl-convert) that can pre-process NovaSeq data. I don't think either produces a summary.csv file on local servers (may be different on BaseSpace).

ADD REPLY
0
Entering edit mode

Sorry, added some clarification to the original post.

ADD REPLY
0
Entering edit mode
2.5 years ago
GenoMax 142k

I think this file appears to be from Illumina Sequencing Analysis Viewer (which extracts it from the raw data/InterOp folder). Sequencing facilities use this file to review quality of data. For an end user this file does not have a lot of value. Since you are using a NovaSeq the column title values from screenshot below should apply in your case. Aligned column is data aligned to phiX spike in. It is used to calculate the error rate.

enter image description here

ADD COMMENT
0
Entering edit mode

Yes, this is a portion of what is produced and is the top part of the output in my pastebin link. I also get a bunch of other outputs though, similar to those listed in the table on page 19 of this illumina SAV user guide. I'm still confused on some of the units they use though because for example:

The guide's description of yield is: "The number of bases sequenced, which is updated as the run progresses". My resulting yield in the summary.csv files is listed as 8.05. This does not make sense and must be in gigabases or some other unit but the unit is not specified.

ADD REPLY
0
Entering edit mode

Can I ask as to why you are trying to use this particular summary instead of using the demultiplexing report after processing with bcl2fastq? Did you end up with much lower data yield than you expected? 8 gigabases is not much if that is all you have from a lane of any type of NovaSeq FC (or even one lane). If this is the case then you need to work with Illumina tech support to see where there were issues. It is possible that your have very low cluster density (or you have overloading) either can lead to only a small amount of data.

ADD REPLY
0
Entering edit mode

We are a sequencing institution and our boss would like to have this summary available for all of our runs. There wasn't any issues with this run and it wasn't to trouble shoot I just needed some data to post as an example.

I'm not sure if the yield is actually in gigabases or not, that was my confusion and reason for this post - the illumina user guide for these SAV summary tables doesn't include any units in the descriptions.

ADD REPLY
0
Entering edit mode

The yield in the table above is in gigabases. For end-users we like to use values from demultiplexing report produced by bcl2fastq/bclconvert since it gives per sample statistics which are more useful.

ADD REPLY

Login before adding your answer.

Traffic: 2328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6