Is there a format to describe sample names and their associated flowcell(s), lane(s) and barcode(s) from Illumina sequencing experiments?
The Illumina documentation describes the following notation for multiplex and non-multiplexed runs:
Naming Illumina FASTQ files use the following naming scheme: <sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz For example, the following is a valid FASTQ file name: NA10831_ATCACG_L002_R1_001.fastq.gz In the case of non-multiplexed runs, <sample name> will be replaced with the lane numbers (lane1, lane2, ..., lane8) and <barcode sequence> will be replaced with "NoIndex".
And I have seen bcbio has some code and example yaml files to describe some of this, and it seems scilifelab has adopted it:
What I am looking for is a standard or something close to a standard that people have adopted for this.
Does anything like this exist? Is Common Workflow Language CWL dealing with this? Galaxy? Genologics?