List of accession numbers for nucleotide sequences to protein sequences using R
5.0 years ago
arla_21 • 0

Hi I'm sure this is simple but I am quite new to the area so be gentle I have a list of accession numbers corresponding to full length sequences. I want to use these to download the protein sequences for all of the full length sequences using Rentrez. I can do this easily for one accession number:

search1 <- entrez_search(db="nuccore", term="JQ348844", [ACCN])
protein_links <- entrez_link(dbfrom='nuccore', id=search1$ids, db='all') protein_seq <- entrez_fetch(db="protein", rettype="fasta", id=protein_links$links\$nuccore_protein)


You can't input more than one accession into the term field of the first search. I'm sure you can do this by a simple loop or something similar but I want one file in the end with all the protein sequences from all the input accession numbers.

Sorry if this is a stupid question! Thanks in advance

5.0 years ago
tarek.mohamed ▴ 340

Hi

you can do this by using BSgenome package in R

library("BSgenome")
available.genomes()
installed.genomes()
hg38_genome <- getBSgenome("BSgenome.Hsapiens.NCBI.GRCh38")
hg38_genome
seq<-getSeq(hg38_genome,target_genes)


whereas, "target_genes" is character vector containing the names of the sequences in hg38_genome where to get the subsequences from. Hope this is helpful! Tarek