Question: Normal Patient Samples from GSE62944
gravatar for hAjmal
4.5 years ago by
hAjmal40 wrote:


I am trying to do Differential Expression Analysis of Genes of Normal vs. Breast cancer patients. For that ourpose, I chose GEO data GSE62944 as it contains 9264 Tumour Samples and 741 normal samples.

Question 1:

I load the expression set using code


ah = AnnotationHub()
query(ah , "GSE62944")

What I see is:

AnnotationHub with 1 record
# snapshotDate(): 2016-03-09 
# names(): AH28855
# $dataprovider: GEO
# $species: Homo sapiens
# $rdataclass: ExpressionSet
# $title: RNA-Sequencing and clinical data for 7706 tumor samples from The Cancer Genome Atlas
# $description: TCGA RNA-seq Rsubread-summarized raw count data for 7706 tumor samples, represented as an R / Bioconductor...
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: tar.gz
# $sourceurl:
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: TCGA, RNA-seq, Expression, Count 
# retrieve record with 'object[["AH28855"]]'

Why does the title say 7706 tumor samples? Does this expression set contains normal samples at all? How would I access the normal samples?

Question 2:

I subset the breast cancer patient samples using code:

tcga_data <- ah[["AH28855"]]

brca_data <- tcga_data[, which(phenoData(tcga_data)$CancerType=="BRCA")]

How can I subset both breast cancer and normal samples from the entire dataset?

Question 3:

Is there a way to subset specific genes (i.e rows ) from the data set?

Help would be appreciated

ADD COMMENTlink modified 4.5 years ago by genomax89k • written 4.5 years ago by hAjmal40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1565 users visited in the last hour