Normal Patient Samples from GSE62944
Entering edit mode
6.5 years ago
hAjmal ▴ 50


I am trying to do Differential Expression Analysis of Genes of Normal vs. Breast cancer patients. For that ourpose, I chose GEO data GSE62944 as it contains 9264 Tumour Samples and 741 normal samples.

Question 1:

I load the expression set using code


ah = AnnotationHub()
query(ah , "GSE62944")

What I see is:

AnnotationHub with 1 record
# snapshotDate(): 2016-03-09 
# names(): AH28855
# $dataprovider: GEO
# $species: Homo sapiens
# $rdataclass: ExpressionSet
# $title: RNA-Sequencing and clinical data for 7706 tumor samples from The Cancer Genome Atlas
# $description: TCGA RNA-seq Rsubread-summarized raw count data for 7706 tumor samples, represented as an R / Bioconductor...
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: tar.gz
# $sourceurl:
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: TCGA, RNA-seq, Expression, Count 
# retrieve record with 'object[["AH28855"]]'

Why does the title say 7706 tumor samples? Does this expression set contains normal samples at all? How would I access the normal samples?

Question 2:

I subset the breast cancer patient samples using code:

tcga_data <- ah[["AH28855"]]

brca_data <- tcga_data[, which(phenoData(tcga_data)$CancerType=="BRCA")]

How can I subset both breast cancer and normal samples from the entire dataset?

Question 3:

Is there a way to subset specific genes (i.e rows ) from the data set?

Help would be appreciated

gse62944 GEO ExpressionSet DEAnalysis • 2.0k views

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6