Question: List of accession numbers for nucleotide sequences to protein sequences using R
0
gravatar for arla_21
3.3 years ago by
arla_210
arla_210 wrote:

Hi I'm sure this is simple but I am quite new to the area so be gentle I have a list of accession numbers corresponding to full length sequences. I want to use these to download the protein sequences for all of the full length sequences using Rentrez. I can do this easily for one accession number:

search1 <- entrez_search(db="nuccore", term="JQ348844", [ACCN])
protein_links <- entrez_link(dbfrom='nuccore', id=search1$ids, db='all')
protein_seq <- entrez_fetch(db="protein", rettype="fasta", id=protein_links$links$nuccore_protein)

You can't input more than one accession into the term field of the first search. I'm sure you can do this by a simple loop or something similar but I want one file in the end with all the protein sequences from all the input accession numbers.

Sorry if this is a stupid question! Thanks in advance

rentrez R • 1.2k views
ADD COMMENTlink modified 3.3 years ago by tarek.mohamed270 • written 3.3 years ago by arla_210
0
gravatar for tarek.mohamed
3.3 years ago by
tarek.mohamed270
tarek.mohamed270 wrote:

Hi

you can do this by using BSgenome package in R

library("BSgenome")
available.genomes()
installed.genomes()
hg38_genome <- getBSgenome("BSgenome.Hsapiens.NCBI.GRCh38")
hg38_genome
seq<-getSeq(hg38_genome,target_genes)

whereas, "target_genes" is character vector containing the names of the sequences in hg38_genome where to get the subsequences from. Hope this is helpful! Tarek

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by tarek.mohamed270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour