Question

should I merge several SRR fastq file download from sra?

0

Entering edit mode

3.1 years ago

hellocita ▴ 40

Excuse me, everyone, I am new in this field. I need help

There are several SRR file with same suffix, such as (GSM1169462 is sample id):

SRR914044_GSM1169462_XXX_1w_r2_Mus_musculus_RNA-Seq.fastq.gz
SRR914045_GSM1169462_XXX_1w_r2_Mus_musculus_RNA-Seq.fastq.gz 
SRR914046_GSM1169462_XXX_1w_r2_Mus_musculus_RNA-Seq.fastq.gz

should I merge these file before running STAR?

And there is r1, r2 suffix in the file, should I merge r1 and r2 before running STAR? I am not sure they were biological replicate or pairwise file

Thank you!

RNA-Seq • 1.5k views

ADD COMMENT • link updated 3.1 years ago by lieven.sterck 15k • written 3.1 years ago by hellocita ▴ 40

1

Entering edit mode

These appear to be 5 single-end runs. And sample itself is called paITR_1w_r2 https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SAMN02208963&o=acc_s%3Aa

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you for your kindly answer. I don't understand why were this sample run 5 times? should I treat each run SRR9140** ID as a replicate sample?

ADD REPLY • link 3.1 years ago by hellocita ▴ 40

1

Entering edit mode

it's a common (required) practice. Those are biological replicates of the experiment and are needed to do reliable statistics on the data later on.

so yes, they are to be seen as replicates (biological NOT technical replicates)

ADD REPLY • link 3.1 years ago by lieven.sterck 15k

score 1 · Answer 1 · 2021-03-11

1

Entering edit mode

3.1 years ago

lieven.sterck 15k

it will in any case pay of to have a look at the metadata for that experiment (eg. check if they submitted paired end sequencing for instance).

If it is paired end sequencing you should NOT merge the r1 and r2 files to eachother. Those are the forward and reverse read of a pair and you should keep them separate, most aligner programs will have no problem reading from those files.

In your example they are all called the same though. if under the XXX it says something like L001 , L002 or such , those you can and should merge. Those are just that your sample has been run on several lanes of the machine (sort of technical repeat), though they are in fact only one biological sample.

ADD COMMENT • link 3.1 years ago by lieven.sterck 15k

0

Entering edit mode

Hello! There is no L001 in the file. However some file is ended with a suffix: SRR914190_GSM1169515_paTh2_StS6CT_Mus_musculus_RNA-Seq_1.fastq.gz SRR914190_GSM1169515_paTh2_StS6CT_Mus_musculus_RNA-Seq_2.fastq.gz does that means these two fastq files were split and should be combined before STAR?

ADD REPLY • link 3.1 years ago by hellocita ▴ 40

0

Entering edit mode

you should look at the metadata for those samples but from the looks of it I would say they are a paired-end dataset. If so then you don't have to (are not even allowed to) combine them but give them as two different files to the aligner/quantifier and indicate it's a paired-end dataset

ADD REPLY • link 3.1 years ago by lieven.sterck 15k