Batch Effect
1
0
Entering edit mode
1 day ago
Umair • 0

Is it really CONSIDERED a batch effect when I extracted information from FASTQ file, reading first lines of it, and got the run_number and flowcell_ID from those lines? Or I am unintentionally reading too much by extracting such information for my RNA-sequence dataset which actually is not a batch effect?

batch effect • 186 views
ADD COMMENT
0
Entering edit mode
1 day ago

Hey,

It is not 'over-interpreting' - the information that you have extracted can indeed be used to identify potential batches. In RNA-seq, the sequencing run and flowcell are well known sources of technical variation / batch effects, and, for this reason, are sometimes explicitly included in the statistical model. The flowcell ID, in particular, can be important.

To check if these are actually driving a batch effect in your data, I would advise to generate a PCA bi-plot (or heatmap) of your normalised counts and colour the samples by flowcell / run. If you see a clear separation, then, yes, there is a batch effect. In that case, you can use this information as a covariate in your model, e.g., in DESeq2's design formula.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you Kevin for your reply. Kindly, can you advise me after checking the PCAs of my dataset?

enter image description here enter image description here enter image description here enter image description here

ADD REPLY
0
Entering edit mode

I would not consider the flow cell meaningfully and clearly impacting things based on your non-batch corrected plots. I'd leave it out of your model design personally. Note that checking additional PCs can also be helpful (PC3/4), but it can quickly become a ghost hunt. True batch effects are typically pretty obvious.

ADD REPLY

Login before adding your answer.

Traffic: 4552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6