We are setting up a pipeline that takes in single cell Gene Expression Matrices, runs them through a series of various preprocessing steps, and then trains various machine learning models to generate classifiers for label(s) on those datasets.
We'd like to build a collection of 1k datasets to test our pipeline against (~1% of GEO's GSE collection--the number could vary depending on submitted scRNAseq experiments with labels).
We are using the Bioconductor packages GEOquery and GEOmetadb. So far it's hard to figure out which GSEs have GEMs. Some do, some don't. Some just have links to GSMs. I wonder if I'm doing something dumb, or if most GSEs don't include GEMs?
Maybe someone with more experience using GEO could have some advice?