Question

How to handle more than 1 SRA Run per Experiment?

0

Entering edit mode

3 months ago

Janmajay • 0

While downloading some raw WGBS data from the Roadmap Epigenomics project, I noticed that multiple SRA Runs were associated with each Experiment.

Reading the FAQs here, I was under the impression that only 1 Run can be associated with each Experiment. I saw that the Experiments only differed in the "Bases" and "Bytes" columns.

For example, BioSample SAMN00857854 (GEO Accession: GSM916051) was sequenced with Illumina HiSeq 200 and has one Experiment SRX142783 with the following SRA Runs:

Run : ['SRR1143696' , 'SRR1143697' , 'SRR1143700' , 'SRR1143702' , 'SRR1143704']

which correspond to:

Bases : [48905968410 , 49852911810 , 34303272200 , 18904063200 , 34950365000]

Bytes : [33485451056 , 32870075947 , 24289423164 , 13323536868 , 24641285065]

as the only metadata values that differ across Runs.

How do I handle these Runs? Is the correct way to:

Concatenate FastQ files from multiple Runs into one file before preprocessing? OR
Preprocess FastQ files from each Run separately and treat as technical replicates?

SRA Preprocessing • 333 views

ADD COMMENT • link updated 3 months ago by ATpoint 84k • written 3 months ago by Janmajay • 0

score 1 · Answer 1 · 2024-04-13

1

Entering edit mode

3 months ago

ATpoint 84k

Concatenate FastQ files from multiple Runs into one file before preprocessing?

Yes, that is the common thing to do. It's the exact same library just sequenced over multiple lanes.

ADD COMMENT • link 3 months ago by ATpoint 84k