Obviously I've read this (https://www.ddbj.nig.ac.jp/faq/en/biosample-bioproject-sra-e.html):
What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?
BioSample is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. Biological and technical replicates need to be registered as separate BioSamples distinguished by the "replicate" attribute having values such as "biological replicate 1" and "biological replicate 2".
Each SRA Experiment is a unique sequencing library for a specific sample. Importantly, much of the descriptive information that is displayed in the public record of your data is captured at the level of the DRA Experiment.
SRA Runs are simply a manifest of data file(s) that should be linked to a given sequencing library – no information present in the Run is displayed on the public record of your project. Note that all data files listed in a Run will be merged into a single SRA archive file (and fastq file for distribution), so files from different samples should not be grouped in the same Run. Paired-end data files (forward/reverse), conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end. Do not divide a sample for a paired-end library (for example, forward and reverse).
Still I am struggling to understand. e.g., this study: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=ERP004697
- It has 2 biosamples, 8 experiments and 8 runs
- It's clear to me that the 8 runs are 8 different data files.
- As there are 8 experiments, there must be 4 distinct sequencing libraries for each biosample - their could e.g., have different insert sizes (this is not the cases for this study though)
- In the corresponding paper, it's stated that there are two technical replicates per sample, which I found also stated under the experiment factor [replicate]. Ok.
- The paper also says that they have multiplexed 4 on one lane. Now I would conclude that two libs were created for each biosample and all four together have been sequenced on two lanes. Is this wrong?
- Still, why aren't there only 2 experiments (hence 2 distinct sequencing libraries) per biosample?
- And, I thought replicates should be different biosamples?
e.g., this study: the Illumina BodyMap Data (only the 16 Tissues mixture samples): https://www.ncbi.nlm.nih.gov/Traces/study/?acc=ERP000546
- There are 3 biosamples
- there are 16 runs, so 16 data files.
- each is a separate experiment.
- the experimental factor I could find is LIBRARYPREP, which has 3 different values
- again, I don't get why each run is a separate experiment. Some I would say are the same...
What am I not getting here? Thanks!!!