So, after collecting several samples of iPSC (RNA-seq) data from independent NCBI GEO dataset, I think not all iPSC has similar gene expression profile. I also read from paper that gene expression of iPSC may varied.
In that case, I collect 4-5 iPSC samples from multiple NCBI GEO dataset. They are all obtained using RNA-seq experiment for reprogramming iPSC and independent to each other.
My goal is simple, extracting similar genes in these different and independent samples. My hypotheses is even with different states of iPSC, there will be an underlying similar mechanism which can be shown from gene expression level. By obtaining those similar gene, we can conclude that those gene would be important in giving pluripotency characteristic of iPSC and other gene expression level that are varied among samples would not be important.
What kind of feature extraction would be useful to obtained these genes?
One of the method that I can think of is to find distance for each gene from every possible pairwise combination. For example there are 3 samples, A,B,C and gene G. I will find distance for gene G from A vs B, A vs C, and B vs C. If the distance is small, gene G is selected.