I would like to download TCGA and GTEX gene expression data for ovarian cancer and ovary respectively from the Xena toilHub platform (all genes; RSEM expected counts). However, I only found this web page (link) with the links to download the full dataset (samples for all cancer/tissue types). And if I try to process this file to select only the patients/samples of interest, this makes my computer crash (too heavy file).
Is it a way to download from toilHub only the expression data for the samples that correspond to "TCGA Ovarian Serous Cystadenocarcinoma" (TCGA) or "Ovary" (GTEX) ?
I have also tried with the R package "UCSCXenaTools", but without success... So I would be interested if anyone could provide both the website download solution + the R package download solution...
I have already spent an half-day on that so I would be very grateful if anyone could help !
Side-question: the dataset with all samples one can see here holds expression values for > 60'000 ensembl gene identifiers. How is it possible (different transcripts for a same gene ?) ? What is the best way to map/aggregate the expression to unique gene symbols (e.g. HGNC gene symbols) ?
Thanks a lot