I downloaded the series matrix file of a single microarray dataset (breast cancer), data were normalized and log-transformed, is box plot of data. I collapsed multiple probes of the same gene as the single gene using
limma::avereps. the box plot was slightly changed after collapsing data as you can see . is this change a matter in your professional view? I used collapsed data to generate a PCA plot based on cancer subtype as you can see . Could you please let me know if you see any signs of a batch effect in the PCA plot, especially for those samples located at the right corner of the plot (basal subtype)? if yes, please kindly let me know how I can define a batch variable using this information and correct the batch during the analysis?