Is it really CONSIDERED a batch effect when I extracted information from FASTQ file, reading first lines of it, and got the run_number and flowcell_ID from those lines? Or I am unintentionally reading too much by extracting such information for my RNA-sequence dataset which actually is not a batch effect?
It is not 'over-interpreting' - the information that you have extracted can indeed be used to identify potential batches. In RNA-seq, the sequencing run and flowcell are well known sources of technical variation / batch effects, and, for this reason, are sometimes explicitly included in the statistical model. The flowcell ID, in particular, can be important.
To check if these are actually driving a batch effect in your data, I would advise to generate a PCA bi-plot (or heatmap) of your normalised counts and colour the samples by flowcell / run. If you see a clear separation, then, yes, there is a batch effect. In that case, you can use this information as a covariate in your model, e.g., in DESeq2's design formula.
I would not consider the flow cell meaningfully and clearly impacting things based on your non-batch corrected plots. I'd leave it out of your model design personally. Note that checking additional PCs can also be helpful (PC3/4), but it can quickly become a ghost hunt. True batch effects are typically pretty obvious.
Thank you for the reply. What do you suggest me that should I keep all 3 replicates for each treatment or drop some of them? as from PCA it appears that atleast two replicates for most of my treatments cluster closer. My PCA confuses me.
Thank you Kevin for your reply. Kindly, can you advise me after checking the PCAs of my dataset?
I would not consider the flow cell meaningfully and clearly impacting things based on your non-batch corrected plots. I'd leave it out of your model design personally. Note that checking additional PCs can also be helpful (PC3/4), but it can quickly become a ghost hunt. True batch effects are typically pretty obvious.
Thank you for the reply. What do you suggest me that should I keep all 3 replicates for each treatment or drop some of them? as from PCA it appears that atleast two replicates for most of my treatments cluster closer. My PCA confuses me.