How To Create A Matrix In R?
2
1
Entering edit mode
12.7 years ago
Sanju ▴ 90

Hi all

I have 300 sheets in an excel file. I need only Sequence identity data from this file. This data is in 9th column. I imported all sequence identity values from 300 sheets in to R using the following code.

library(gdata) myfile<-NULL; for (i in 1:300) 
{ myfile[[i]]<-read.xls("C://Users//Desktop//mydata.xls",sheet=i,head=F)[,9] } .

Next I have to apply distance formula and to create a matrix. The formula is

distance= 100-sequenceidentity.

How to apply this formula in R and how can I create a matrix ?

r matrix • 5.5k views
ADD COMMENT
1
Entering edit mode

You did not tell us how many rows there are on each sheet. Is it only one, multiple values per sheet and if multiple values, always the same number?

ADD REPLY
0
Entering edit mode

Sanju, what options did you already try?

ADD REPLY
0
Entering edit mode

@ Ido tamir first sheet contains 230 rows and 10 columns. second sheet contains 229 rows. 3rd sheet contains 228 rows and so on. But no:of columns(10 columns) are same in all sheets.

ADD REPLY
0
Entering edit mode

@Egon, I tried dist.alignment function to generate a distance matrix. But I got error because my file is not an aligned file. It is an excel file. My aim is to generte distance matrix using the following formula. Distance = 100-sequenceidentity. My sequence identity values are in 9th column.

ADD REPLY
2
Entering edit mode
12.7 years ago
Stevelor ▴ 310

Have a look here and the documentation

HTH!

ADD COMMENT
2
Entering edit mode
12.7 years ago
Ido Tamir 5.2k

I think its basic R and not quite appropriate but anyway.

So it seems you have the upper? lower? triangle of your identity matrix in a list (but why 230 rows for 300 sheets? There is something amiss). You could do something similar to this, but look out for the order of the indices, and if you have the identity with itself in your data etc...!

li <- lapply(5:1, function(x){ 1:x }) #li == myfile
maxl <- max(sapply(li, length))
mat <- diag(maxl)
mat[lower.tri(mat,diag=TRUE)] <- unlist(li)
dist.mat <- 100 - mat
ADD COMMENT
0
Entering edit mode

I think it is appropriate because a lot of biological data is in excel, biologists and computational biologists use excel, they will probably never stop using excel, and many bioinformatics people trying to do computation on data stuck in excel files may not realize it is possible or easy to do what @sanju is asking. By reading this question, they will discover that it is. He should edit the question title to be more specific however.

ADD REPLY
0
Entering edit mode

He does not have an R/Excel problem, he has an R/list to matrix problem. He managed the Excel part, but lacks basic knowledge about R ("how to apply formula in R").

ADD REPLY
0
Entering edit mode

All the better. So someone watching him struggle could learn two things. As long as he's struggling to apply computation to a biological problem, it's instructive to others. Masters in the art may see that his problem is really just some language ignorance, but for the majority of us, it may be confounding. Half the questions on this board could probably be reduced to computational ignorance (nothing really to do with biology). e.g. "Want to extract sequences from UCSC? Don't bother us, go look at an SQL forum." As long as it's in a biological context, I think it's ok to ask here.

ADD REPLY
0
Entering edit mode

I agree. I would be happy to see more R, Bioconductor, etc. questions being addressed here. My previous experience with R forums have been mostly negative. The documentation is sometimes confusing, and the existing forums are full of people just telling users to rtfm...

ADD REPLY

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6