So, I compare DE genes between 2 categories and I do that with 2 data sets, one from micro array and one from RNA-seq. I processed the data to get the DE genes using its own workflow. After that, I got the list of gene features and its log fold change. I want to compare the result from both data sets. But there are a lot of difference from the gene features. So, I do some trimming process. From both data sets, I filtered the features so that I have the same gene features for both data set. I trimmed duplicate features in micro array and select features that only exist in both RNA-seq and microarray. From that, I got around 16.000 gene features. Do you think this process is biologically accepted?
Now, I want to do GO analysis using GOrilla or other similar tools. What should I choose for my background data? The original features, either from micro array or RNA-seq, or the trimmed ones (around 16.000)? Thank you for your answer.