Lanes in the context of RNA-SEQ data
Heya. I'm here to understand lanes in RNA-SEQ. I got my data from my sequencer provider with the file naming convention showing 4 lanes in total. Each experimental condition is found 3 times in each lane. The way I'm reading this is that I have 4 replicates. Am I correct?

I have been chatting here but I have been told to start a new post so here I am. Also, I think I asked the wrong question.

The file naming convention looks like this: (the first column is sample name and the second is the lane)

a1 | LANE 1
a2 | LANE 1
a3 | LANE 1
b1 | LANE 1
b2 | LANE 1
b3 | LANE 1
a1 | LANE 2
a2 | LANE 2
a3 | LANE 2
b1 | LANE 2
b2 | LANE 2
b3 | LANE 2


I have 4 lanes in total with 5 different samples ( that repeat 3 times each ) per lane - the same samples are repeating on each lane. I have also been told we have 4 replicates for each prep. So the a1, a2 and a3 from each line is from the sequencing?

From those 5 different samples: 1 is the control, 1 is the negative and the other 3 are different experimental conditions. I really have difficulties understanding the experimental design here. Any input would be useful. Thank you!

If you have individual data files for a1,a2,a3,b1,b2,b3 that have L001 in file names then you do have 3 replicates of a and b that were pooled and then run on flow cell. As the same pool ran on multiple lanes you should have a corresponding set of individual data files that have L002 in their file names.

a1_L001.fastq.gz
a1_L002.fastq.gz
a1_L003.fastq.gz
a1_L004.fastq.gz


Are sequencing replicates for sample a1 that ran on multiple lanes as a part of the large pool. Those files can be merged together for analysis.

I have also been told we have 4 replicates for each prep.

This part can't be explained by information you provided here. For more: C: What Is A "Lane" In Next Generation Sequencing Context?

That was really useful. Now I definitely have a starting point. Is it plausible that lanes, in this case, separate different experiments?

I was not involved in biological experiments or sequencing. I'm working with what I have.

(I couldn't post any comment for a few hours and I had to wait) Cheers!

Anything is possible, but it would be extremely unwise to have two fastqs which are different samples whose names only differ in their lane assignment. We had someone submit to us like that once, and then we demanded that in the future they name their samples better. Also, some sequencers like the NextSeq have 4 'lanes', but everything goes on all 4. You can't put one sample on lane 1, and a different samples on lane 2. If those were run on a nextseq, they must all be the same sample.

Thank you. They were in fact on a NextSeq (looked into the library prep) and, indeed, all samples were prepared in triplicate.

So

a1_L001.fastq.gz a1_L002.fastq.gz a1_L003.fastq.gz a1_L004.fastq.gz

should be the same sample. :)

I don't believe it's possible to know what instrument a sample went on based on its library prep. You can probably tell from the names of the reads what kind of instrument they were run on.

Sorry. The sequencer provider told me they ran on a NextSeq 550 2x 75bp high output kit. The library prep was done with NEB Ultra II Directional RNA Library Prep Kit for Illumina® . I expressed myself poorly.

Is it plausible that lanes, in this case, separate different experiments?

Based on example you posted I don't think so. All of your samples appear to be part of a large pool that ran across the entire flowcell.

That said one can certainly separate experiments on lanes with right flowcell design.

@swbarnes2 makes a good point. Some illumina sequencers have optically distinct lanes (NextSeq, NovaSeq without XP Kit) that are not physically separate, lanes does not make a difference there.

Thank you, GenoMax :)