I need a suggestion regarding micro array data analysis.I analyzed a GEO dataset from GEO2R.But when I download the significant gene file with logfc,p.value,adj.p.value,reference id,ensembl id etc,there i found missing some gene symbol into the column of gene symbol.So how can i get rid of this problem.Will i should search by ensembl or ref id into the google for the gene or i should remove the missing gene symbol from the dataset and proceed the further work with the existing dataset?
This isn't really an error. Some genes just don't have symbols. RefSeq IDs come from NCBI, Ensembl IDs from Ensembl and gene symbols from HGNC (in the case of the human genome). These groups do not necessarily agree on which bits of sequence are genes or what genes they are. There are quite a few Ensembl IDs, for example, that have not been assigned gene symbols by HGNC. Or if the array is old, probes may have been designed against sequences that it has since been decided are not genes. In human (but not necessarily other organisms), the majority of transcripts that do not have gene symbols are either non-coding RNA genes or un-validated gene predictions, and if you are only interested in well supported, well understood protein coding genes, its generally safe to ignore them.
Yes I agree with Nitin, you can use biomart on the enseml website. Alternatively you can use biomartR. Be sure to check which version of the ensembl genome is used. You can also download the respective gtf, and wrangle it in R yourself to get the ensembl id and the gene names, then use something like dplyr::left_join(your_data, wrangled_gtf, by = ensembl_id) and end up with a full list of symbols.