So I am trying to execute the follow code on a list of ID's instead of an individual ID:
source("https://bioconductor.org/biocLite.R") #install.packages('reutils') #install.packages('Peptides') #biocLite(pkgs = c('GenomeInfoDb','GenomicRanges')) #install.packages('plyr') #install.packages('devtools') #devtools::install_github("gschofl/biofiles") library(Peptides) library(reutils) library(Biostrings) library(biofiles) library(plyr) library(stringr) library(tibble) #install.packages('data.table') library(data.table) #this exactly the end format of that data frame I want but instead of 1 UID like 124511 a list of UIDs fetch <- efetch(124511, db=db, rettype = 'gp', retmode = retmode, retmax = returnAmount) rec <- gbRecord(fetch) seq <- getSequence((ft(rec))) m <- as.data.frame(seq) setnames(m, "x", "sequence") protienName <- names(seq) m <- add_column(m, protienName, .after = 0) m$molecularweight <- mw(m$sequence) m$m<- str_count(m$sequence, 'm') m$cc <- str_count(m$sequence, 'cc') logvec <- grepl('(Protein)|(Region)', m$protienName) m <- subset(m, logvec)
The problem is efetch() can only use one ID at a time. So I must either write a for loop or use the apply function on the list of protein IDs. If I were to take the code as is and tried to make it for a list each iteration would delete the previous one. Therefore I was hoping someone can help me append the data.frame each time or show me a way that each iteration wouldn't replace the previous.
efetch can take a list of ids https://www.rdocumentation.org/packages/reutils/versions/0.2.2/topics/efetch
Wow...how about that. For some reason I really thought it couldn't! Thanks I will try just feeding it a list then. Thanks.