Question

Batch effect consideration (re-seq the same sample twice)

0

Entering edit mode

7 months ago

jkim ▴ 170

Hello,

I would like to know how you guys address batch effects on re sequence on the same samples (Fastq files).

Our client targeted 20 million reads for all of her samples. However, in the first run, we generated less than 20 million reads for a couple of samples(sample_2,3 and 7). So we re sequenced those samples again.

For the 1st run

sample_id  #_obtained_reads
sample_1   21.4
sample_2   11
sample_3   12
sample_4   35.5
sample_5   23.8
sample_6   29.4
sample_7   10
sample_8   23.8
sample_9   24.3
sample_10  18.6

For the 2nd run

sample_id    #_obtained_reads
sample_2     9
sample_3     8
sample_7     10

When it comes to downstream analysis, how would you address those samples(sample2, 3 and 7). Would you just merge them? i.g.

cat sample_2.fastq.gz (from the 1st run) sample_2.fastq.gz (from the 2nd run) > sample_2.merged.fastq.gz ?

Or would you visualize PCA or hclustering to see if they cluster together or not, and then decide to drop/merge the samples from the 2nd run?

RNA-seq batch-effect • 573 views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 7 months ago by jkim ▴ 170

score 3 · Accepted Answer · 2023-09-18

3

Entering edit mode

7 months ago

ATpoint 82k

Would you just merge them?

Yes, it's standard to merge sequencing replicates.

Or would you visualize PCA or hclustering to see if they cluster together or not, and then decide to drop/merge the samples from the 2nd run?

You can do that for the sake of checking, but generally sequencing is not expected to generate batch effects, unless sequencers are different technology, like Illumina vs any other platform. A simple PCA will tell.

ADD COMMENT • link 7 months ago by ATpoint 82k

0

Entering edit mode

Thanks, ATpoint. Oh I have another quick question. Does this apply to scRNAseq (10x) data also? Merge sequencing replicates just like bulkRNAseq?

ADD REPLY • link 7 months ago by jkim ▴ 170

1

Entering edit mode

Running the exact same library on the same kind of instrument (assuming no instrument glitch) will not add any technical artifacts.

You absolutely should merge any kind of data with UMIs, because you don't want two molecules from the same cell of the same gene and UMi being counted separately just because they ran at different times.

ADD REPLY • link 7 months ago by swbarnes2 14k