Retrieve multiple FASTA from UniProt using R
1
0
Entering edit mode
4.5 years ago

I have a huge list of UniProt protein accession numbers and I need to retrieve the FASTA sequences from UniProt website. Is there any R script for doing that? I know I can do it from UniProt website but this will be used for further developing script for downstream mass spec analysis.

R uniprot FASTA • 4.4k views
ADD COMMENT
1
Entering edit mode

See if this example from biomartr can help:

Example Retrieval Uniprot

Also, the Rcpi BioConductor package has a getFASTAFromUniProt() function.

ADD REPLY
0
Entering edit mode

Yes I can download whole proteome from Example Retrieval Uniprot. But I want to use different uniprot identifiers like ..................

ids <- c("P62754", "Q8K094", "Q64337")

I am using R 3.6.1 which doesn't support BioConductor. What to do?
> install.packages("bioconductor")
Installing package into ‘C:/Users/Gita/Documents/R/win-library/3.6’ (as ‘lib’ is unspecified)
Warning in install.packages :   package ‘bioconductor’ is not available (for R version 3.6.1)
ADD REPLY
2
Entering edit mode

BioConductor is a repository, not a package. The Rcpi link above has clear install instructions:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Rcpi")
ADD REPLY
2
Entering edit mode
4.5 years ago
h.mon 35k

Install Rcpi (I suggest using conda, as it can be a bit of a pain to install) and then, under R:

library( Rcpi )
fastas <- getFASTAFromUniProt( c( "P62754", "Q8K094", "Q64337" ) )
write( x, "uniprot.fasta" )
ADD COMMENT

Login before adding your answer.

Traffic: 1624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6