Hi Folks,
I'm looking a) strategic advise and b) for public RNAseq datasets to look at differential gene expression between cancer types/controls.
It seems like most GEO sets or cBioportal sets don't provide raw counts, which is what I'd need to use in edgeR.
My general "battleplan" was to analyze Dataset A, get a gene list and then analyze Dataset B, get a gene list. Compare lists, focus on overlapping genes (assuming there is overlap) and investigate additional genes obtained from each analysis.
I did get hold of raw counts for the TCGA datasets and was able to get a set of DE genes out of those (Dataset A) but now got stuck on Dataset B.
For a), assuming that I can only get TPM/FPKM or z-score data, does it make sense to intersect my obtained list and just confirm that those have high TPM/FPKM z-score values in other cancer datasets but not in controls? Or if a dataset has cancer and control samples, would it be feasible to calculate the average logFC and use that value to compare gene lists?
b) Is there any database that provides raw counts for RNAseq cancer studies?
Thanks for any pointers! Alex
Have you seen https://amp.pharm.mssm.edu/biojupies/ ? It uses Kallisto to generate DE analysis from any GEO entry in minutes
That looks promising! I'll look into it, thank you!