I am trying to pull DEG lists from multiple GEO datasets to cross analyze. Is there some way (in either R or python3) that will allow me to convert the probe IDs to something more universal? Ensembl ID, HGNC ID, or Gene ID? Please let me know. Thanks!
How I can have matched gene symbol with probe identifiers in the row names of my expression matrix please? The problem is, for one gene symbol we may have different probe identifiers; For instance for USHBP1 we have 174996658, 174996659, 174996660, 174996661, 174996662, 174996663. So really I don't know what to do know
You can try two things (assuming your dataset used Affymetrix Human Genome U133 Plus 2.0 Array):
Use
BioMaRt
Use
GEOquery
What should I do if the array is not in biomaRt?
Which array is it? - try the manufacturer's website for the annotation. Also look at the Bioconductor annotation packages: https://www.bioconductor.org/packages/release/data/annotation/
Its from the Affymetrix Clariom D Assay
If Human, then the annotation package that you want is: https://www.bioconductor.org/packages/release/data/annotation/html/clariomdhumanprobeset.db.html
Would I just download the annotation package and then run the same script as above and just swap the attribute and filter?
Yes, I posted a solution below for that package.
Yes, but you need to know the array type that you are using. Take a look at this example for Affymetrix U133 Plus 2.0: A: Affymetrix Human Genome U133 Plus 2.0 Array