Extract expression data of candidate genes from normalized microarray list
1
3
Entering edit mode
8.3 years ago
pmedi ▴ 50

Hi all,

I'm a completely R- and Bioconductor beginner, and hope somebody can help me with my basic questions. I have several processed microarray data (normalized) from which I want to extract a list of 300 candidate genes (ILMN_IDs). I need in the output not only the gene names, but also the expression values and statistics info (already present in the original file).

I've tried to make data.frames for each file, and compare them, but I always get error...

I'm sorry, this was already explained in a previous thread, but I could not find any.

Paula

Bioconductor R microarray • 3.5k views
0
Entering edit mode

Wha's the error? Some code can help us to understand.

0
Entering edit mode

well, I tried a very basic way:

> all=normalizedData
> subset=candidateGenes
> x=all%in%subset
> all[x] #returns a Dataframe with 0 columns and 4000 rows.... this is not correct, since normalizedData has 24 columns...

0
Entering edit mode

do you have a column in "all" with the gene id? if so you can try:
all[which(all$gene_id %in% subset)] ADD REPLY 0 Entering edit mode Dear Martombo, thanks for the suggestion, but it still does not work.... I get: in [.default(all, x) : invalid type 'list'  and the Dataframe is still with 0 columns. ADD REPLY 0 Entering edit mode what is the type of your objects? can you show us the output of head(all) and head(subset) ADD REPLY 0 Entering edit mode 8.3 years ago pmedi ▴ 50 here it is: > head(all) Name meanbgt meanbgc cvt cvc meant stderrt 1 ILMN_2188862 0 0.00000 0.11798164 0.2374678 4618.4715 314.59520 2 ILMN_1757497 0 0.00000 0.09400562 0.2306049 13226.2172 717.84198 3 ILMN_1718977 0 0.00000 0.19646015 0.1977541 5560.2394 630.67748 4 ILMN_1677402 0 0.00000 0.12334626 0.1734464 17487.3497 1245.34402 meanc stderrc ratio ratiose logratio tp t2p 1 113.76855 15.597908 40.59533 6.214782 5.343242 0.000138868 0.004758497 2 559.81835 74.534099 23.62591 3.396868 4.562298 0.000061900 0.002937065 3 303.45555 34.646540 18.32308 2.948882 4.195590 0.001138760 0.013880536 4 1093.69965 109.522366 15.98917 1.964738 3.999023 0.000195275 0.005431832 wilcoxonp tq t2q wilcoxonq limmap limmapa SYMBOL 1 0.0808556 0.02560170 0.1645141 0.345836 4.03e-10 4.34e-06 GDF15 2 0.0808556 0.02429853 0.1498372 0.345836 9.14e-10 4.57e-06 VGF 3 0.0808556 0.04910382 0.2084539 0.345836 5.61e-09 1.04e-05 GADD45B 4 0.0808556 0.02802042 0.1682075 0.345836 1.37e-09 4.61e-06 LOC387763 > head(subset) Name 1 ILMN_1757497 2 ILMN_2188862 3 ILMN_1677402 4 ILMN_1751607  ADD COMMENT 1 Entering edit mode ok then try all[which(all$ Name %in% subset$Name),] edit: yes sorry, as simon.pearce pointed out I was missing a comma in the command. it should work now. ADD REPLY 0 Entering edit mode It worked! I got the subset that I wanted Thanks!!! ADD REPLY 1 Entering edit mode If your Name column is unique then you should set it as the rownames when you read the data in, something like: alldata<-read.table(filename, row.names=1, strings=FALSE)  which then allows you to subset the data on those names, with alldata[subset,]  the comma is really important there (and is missing from your previous commands), as it says that you want those particular rows and all the columns. ADD REPLY 0 Entering edit mode @Martombo, I got again the same result: dataframe with 0 columns and 4312 rows... @simon.pearce: I understand the idea, but it still shows that there is a error in rows [i]: invalid type 'list'. Here I put the structure of both data: > class(all) [1] "data.frame" > dim(all) [1] 4312 24 > str(all) 'data.frame': 4312 obs. of 24 variables  info about subset: > class(subset) [1] "data.frame" > dim(subset) [1] 328 1 > str(subset) 'data.frame': 328 obs. of 1 variable:$ V1: Factor w/ 328 levels "ILMN_1651429",..: 177 286 47 169 123 109 268 284 234 186 ...


Thanks!

1
Entering edit mode

sigh Apparently reached a limit of 5 posts with my actual account, so the longer message I just typed out disappeared.

Basically R thinks your subset is a data.frame, and I don't think you want it to be. I think you want a character vector.

I have a function I wrote ages ago to read in a list of genes (one per line) from a text file:

read.genelist<-function(string){


then use subset<-read.genelist(filename) to read filename.txt, and then use that to do your subsetting, all[subset,]

If that list contains some genes that aren't in your table, then you may need to do:

all[intersect(subset,rownames(all)),]