Entering edit mode
3.1 years ago
andyred419
▴
10
Hi, I am new to bioinformatics. I wanted to know if it is possible to get the differential gene expression of two sample sets based on the high and low expression of a particular gene from TCGA data. I want the counts data that I can analyze in R.
A good title should be concise. "Analyze differential expression from TCGA data" is a good title. What you have here is not a title, it's a description.
So what is the question? At which step do you get stuck?
I want two sample sets; one with the individuals having high expression of a particular gene, say A, and another group with the individuals with low expression of the gene A. Just as we do in cBioportal for a quick analysis; however, I want the raw counts data from TCGA, that is not available from cBioportal. Where and how can I find it?
TCGA counts can be obtained e.g. by the TCGAbiolinks or recount packages from Bioconductor. The vignettes cover the "how". From there on just normalize the data and use something like
quantile()
to define quantiles/percentiles as you like to split the groups based on whatever threshold you feel is good, e.g. high (top 25) and low (top 25) to get a good separation.