Question

Batch Effect

0

Entering edit mode

1 day ago

Umair • 0

Is it really CONSIDERED a batch effect when I extracted information from FASTQ file, reading first lines of it, and got the run_number and flowcell_ID from those lines? Or I am unintentionally reading too much by extracting such information for my RNA-sequence dataset which actually is not a batch effect?

batch effect • 186 views

ADD COMMENT • link updated 10 hours ago by jared.andrews07 ★ 19k • written 1 day ago by Umair • 0

score 0 · Answer 1 · 2025-11-06

0

Entering edit mode

1 day ago

Kevin Blighe 89k

Hey,

It is not 'over-interpreting' - the information that you have extracted can indeed be used to identify potential batches. In RNA-seq, the sequencing run and flowcell are well known sources of technical variation / batch effects, and, for this reason, are sometimes explicitly included in the statistical model. The flowcell ID, in particular, can be important.

To check if these are actually driving a batch effect in your data, I would advise to generate a PCA bi-plot (or heatmap) of your normalised counts and colour the samples by flowcell / run. If you see a clear separation, then, yes, there is a batch effect. In that case, you can use this information as a covariate in your model, e.g., in DESeq2's design formula.

Kevin

ADD COMMENT • link 1 day ago by Kevin Blighe 89k

0

Entering edit mode

Thank you Kevin for your reply. Kindly, can you advise me after checking the PCAs of my dataset?

enter image description here

ADD REPLY • link 10 hours ago by Umair • 0

0

Entering edit mode

I would not consider the flow cell meaningfully and clearly impacting things based on your non-batch corrected plots. I'd leave it out of your model design personally. Note that checking additional PCs can also be helpful (PC3/4), but it can quickly become a ghost hunt. True batch effects are typically pretty obvious.

ADD REPLY • link 10 hours ago by jared.andrews07 ★ 19k