Sequence aligment within R data frame
0
0
Entering edit mode
6.9 years ago

So I am trying to loop over a data.frame in R where I have proteins and all of the protein subregions. The identifying factor is the geneID. The first occurrence of the geneID is always the whole protein. The following occurrences are the subregions. I am trying to align the subregions with the whole protein to determine the start and stop locations and then add that back to the DF. The data looks like this:

http://imgur.com/a/lRSrz

The code I am working on looks like this so far but doesn't work, I know I have some obvious errors but I am trying to work through it :

  for(i in 1:length(keyplayers$geneid)) {
    id <- keyplayers$geneid[[i]]
    a <- i + 1
  while(keyplayers$geneid[[a]] == keyplayers$geneid[[i]]) {
    pat <- matchPattern(keyplayers$sequence[[a]] , keyplayers$sequence[[i]])
    keyplayers$start[i] <- start(pat)
    keyplayers$end[i] <- end(pat)

  }
    }

EDIT: So I have been iterating though the code try to get a solution. The above code returns the same start and stop for all! So I am getting close.

Thanks in advance for the help.

R proteins • 1.5k views
ADD COMMENT
1
Entering edit mode

Without looking at the rest of the code, have not forgotten a "1:" in:

for(i in 1:length(x$geneid)){

To loop through the all dataframe?

NB: It would be easier if you could provide the dataframe first lines.

ADD REPLY
0
Entering edit mode

Good catch..I did forget that

ADD REPLY
0
Entering edit mode

Ah it is provided in the imgur link. http://imgur.com/a/lRSrz

ADD REPLY

Login before adding your answer.

Traffic: 2340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6