Normal Patient Samples from GSE62944
0
1
Entering edit mode
8.1 years ago
hAjmal ▴ 50

Hello,

I am trying to do Differential Expression Analysis of Genes of Normal vs. Breast cancer patients. For that ourpose, I chose GEO data GSE62944 as it contains 9264 Tumour Samples and 741 normal samples.

Question 1:

I load the expression set using code

library(AnnotationHub)

ah = AnnotationHub()
query(ah , "GSE62944")

What I see is:

AnnotationHub with 1 record
# snapshotDate(): 2016-03-09 
# names(): AH28855
# $dataprovider: GEO
# $species: Homo sapiens
# $rdataclass: ExpressionSet
# $title: RNA-Sequencing and clinical data for 7706 tumor samples from The Cancer Genome Atlas
# $description: TCGA RNA-seq Rsubread-summarized raw count data for 7706 tumor samples, represented as an R / Bioconductor...
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: tar.gz
# $sourceurl: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62944
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: TCGA, RNA-seq, Expression, Count 
# retrieve record with 'object[["AH28855"]]'

Why does the title say 7706 tumor samples? Does this expression set contains normal samples at all? How would I access the normal samples?

Question 2:

I subset the breast cancer patient samples using code:

tcga_data <- ah[["AH28855"]]

brca_data <- tcga_data[, which(phenoData(tcga_data)$CancerType=="BRCA")]

How can I subset both breast cancer and normal samples from the entire dataset?

Question 3:

Is there a way to subset specific genes (i.e rows ) from the data set?

Help would be appreciated

gse62944 GEO ExpressionSet DEAnalysis • 2.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 3138 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6