I have downloaded DNA Microarray data from NCBI. Data contains both control samples and affected samples for all genes. I want to perform downstream analysis like clustering, classification. I know that some preprocessing steps like normalization, log2 transformation and differential expressed genes selection are necessary before performing clustering or classification.
But I am unsure about the exact order of such preprocessing steps although I know that normalization is performed before log2 transformation. Please let me know the following things:
1) Whether preprocessing steps normalization and then log2 transformation needs to be done before differential expressed genes selection and differential expressed genes selection needs to be done using modified normalized and log2-ed data?
2) In case of RNASeq data, I learned that differential expression analysis is done using un-normalized and un-logged count data as the statistical model is most powerful when applied to un-normalized counts. Then whether we can also select differentially expressed genes from microarray data without performing normalization and log2 transformation? Please note that I will use SAM or Limma for selecting differentially expressed genes from microarray data.
3) Are there any other preprocessing or quality control steps necessary before clustering? If so please mention their exact order.
Thanks in advance.