Hi! I'm going to merge different high throughput RNA-seq datasets, the problem is that the datasets have different amount of genes in each set. For example one have circa 28000 genes and one have circa 35000 genes. How do I best merge these sets? Do I just merge them so that the new merged dataset only have the genes that are in common between the two datasets or is it better to also include the genes that only one of the datasets contain?
I'm going to use it for differential gene expression analysis and for pathway enrichment analysis, and also to find subgroups.