I've downloaded this breast cancer expression profile data from NCBI (GPL570), which has 54675 rows. In this dataset, rows are probes, which i want to convert them into gene symbols to give it to GENIE3. But i've encountered with these problems:
1. 12227 rows of this data, doesn't have any corresponding gene symbol, how can i deal with it?
2. As i know, human genome has 20,000-25,000 genes, and this data, except the rows without corresponding gene symbols, has 21,025 rows with unique gene symbols/probe id. Doesn't it exceed the acceptable area?
i had the problem of having a many to one relation between gene symbols and probe ids, but i think it would be ok, if i consider an average value for expression data with one gene symbol.
Can anyone help me?