Question: extracting several columns from a file
0
gravatar for A
3.0 years ago by
A3.7k
A3.7k wrote:

Hi,

I have a file like below

> head(data[,1:4])
         ID GSM943243 GSM943244 GSM943245
1    EEF1A1 14.517466 14.591990 14.582881
2     GAPDH 11.925736 11.820686 11.719080
3 LOC643334  7.505173  7.494044  7.365844
4   SLC35E2  7.720945  7.642623  7.727642
5    DUSP22  7.348523  7.345953  7.385760
6 LOC642820  7.538024  7.582380  7.501941
> # watching the dimension of matrix
> dim(data)
[1] 25217   203
>

I have a list of accession numbers (for example GSM943243) corresponding control samples, how I can extract controls from my file???

R • 771 views
ADD COMMENTlink modified 3.0 years ago by Biostar ♦♦ 20 • written 3.0 years ago by A3.7k
1

https://dzone.com/articles/learn-r-how-extract-rows
http://stats.stackexchange.com/questions/10225/extracting-multiple-columns-from-a-matrix-in-r

ADD REPLYlink written 3.0 years ago by genomax80k

thank you, inspired by the second link I did like so

list of accessions=t(list of accessions)
>extraction=data[,c(list of accessions)]

but results is without row.names,  I added manually.
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by A3.7k
1

first of all you need to read.table it into an object with rownames=1 then that dataframe can be subsetted with the columns of your choice. Do you know the indexes of the columns that are controls? then it is faily simple. In that case your rownames will be intact. Doing manually rownames is not a good approach. When you are doing programmtically add everything the same way.

ADD REPLYlink written 3.0 years ago by ivivek_ngs4.9k

in my another file

> head(data1[,1:4])
          GSE35974_Biomat_17___BioAssayImplId.212266Name.DE37_111809_rep5
LINC01128                                                        8.789351
SAMD11                                                           7.059227
KLHL17                                                           7.453778
PLEKHN1                                                          7.546892
ISG15                                                            7.091302
AGRN                                                             7.505454
          GSE35974_Biomat_139___BioAssayImplId.212294Name.AC99_111809
LINC01128                                                    8.733914
SAMD11                                                       7.120576
KLHL17                                                       7.503455
PLEKHN1                                                      7.425533
ISG15                                                        6.788893
AGRN                                                         7.269030
          GSE35974_Biomat_137___BioAssayImplId.212295Name.AC96_111809
LINC01128                                                    8.914045
SAMD11                                                       7.232991
KLHL17                                                       7.472246
PLEKHN1                                                      7.352260
ISG15                                                        7.017254
AGRN                                                         7.436749
          GSE35974_Biomat_136___BioAssayImplId.212296Name.AC95_111809
LINC01128                                                    8.977482
SAMD11                                                       7.087887
KLHL17                                                       7.436269
PLEKHN1                                                      7.321820
ISG15                                                        6.922916
AGRN                                                         7.470006

and a list of samples names in another files

> head(data2[,1])

[1] GSE35974_Biomat_69___BioAssayImplId=212347Name=AC45_111109
[2] GSE35974_Biomat_67___BioAssayImplId=212345Name=AC47_111109
[3] GSE35974_Biomat_64___BioAssayImplId=212344Name=AC49_102309
[4] GSE35974_Biomat_74___BioAssayImplId=212351Name=AC41_111809
[5] GSE35974_Biomat_73___BioAssayImplId=212349Name=AC43_102309
[6] GSE35974_Biomat_68___BioAssayImplId=212348Name=AC44_111809
94 Levels: GSE35974_Biomat_10___BioAssayImplId=212273Name=DE35_102309 ...

>

I want to extract only these samples from my expression data file

I did like so

list of samples=t(data2)
 >extraction=data1[,c(data2)]

but telling

Error in [.data.frame(data1, , c(data2)) : undefined columns selected

I could not figure out what happened that not working as previous case

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by A3.7k

to be honest this is a nominal thing to do both in swk,sed or in R. The thing which i see here is the header name of data1 is not same as that in data2. You have to give some example of columns which you show in data[,1:4] , in your data2 as well. I do not see that. Also the names are amazingly long strings which lot of attributes that makes me think they are actually not same. Lets have you have 10 columns in a dataframe x and you want to subset only 6 which are columns you know you want to extract. lets say you know the column indexes. So they are 2,4, 7 through 10.

x<-read.table("file.txt",header=T,row.names=1)
x.1<-x[,c(2,4,7:10)]
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by ivivek_ngs4.9k
1

I would like to suggest editing the question title to include that you are trying to do this in R. For example: "Extracting several columns from a file using R." This will help other people looking for similar answers via search engines. As stated, the answer could be as simple as man cut.

ADD REPLYlink written 3.0 years ago by ariel.balter140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2088 users visited in the last hour