Question

Normal Patient Samples from GSE62944

1

Entering edit mode

8.1 years ago

hAjmal ▴ 50

Hello,

I am trying to do Differential Expression Analysis of Genes of Normal vs. Breast cancer patients. For that ourpose, I chose GEO data GSE62944 as it contains 9264 Tumour Samples and 741 normal samples.

Question 1:

I load the expression set using code

library(AnnotationHub)

ah = AnnotationHub()
query(ah , "GSE62944")

What I see is:

AnnotationHub with 1 record
# snapshotDate(): 2016-03-09 
# names(): AH28855
# $dataprovider: GEO
# $species: Homo sapiens
# $rdataclass: ExpressionSet
# $title: RNA-Sequencing and clinical data for 7706 tumor samples from The Cancer Genome Atlas
# $description: TCGA RNA-seq Rsubread-summarized raw count data for 7706 tumor samples, represented as an R / Bioconductor...
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: tar.gz
# $sourceurl: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62944
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: TCGA, RNA-seq, Expression, Count 
# retrieve record with 'object[["AH28855"]]'

Why does the title say 7706 tumor samples? Does this expression set contains normal samples at all? How would I access the normal samples?

Question 2:

I subset the breast cancer patient samples using code:

tcga_data <- ah[["AH28855"]]

brca_data <- tcga_data[, which(phenoData(tcga_data)$CancerType=="BRCA")]

How can I subset both breast cancer and normal samples from the entire dataset?

Question 3:

Is there a way to subset specific genes (i.e rows ) from the data set?

Help would be appreciated

gse62944 GEO ExpressionSet DEAnalysis • 2.3k views

ADD COMMENT • link updated 8.1 years ago by GenoMax 141k • written 8.1 years ago by hAjmal ▴ 50