Question

Download peptide sequences from NCBI using python

0

Entering edit mode

17 months ago

יובל • 0

I would like to extract the peptide sequence of the following: NM_021969.2 from the NCBI website as shown in this link: https://www.ncbi.nlm.nih.gov/nuccore/NM_021969.2

I was able to extract the nucleotide sequence using the following script, but I am unable to extract the following peptide sequence:

MSTSQPGACPCQGAASRPAILYALLSSSLKAVPRPRSRCLCRQH                  RPVQLCAPHRTCREALDVLAKTVAFLRNLPSFWQLPPQDQRRLLQGCWGPLFLLGLAQ                     DAVTFEVAEAPVPSILKKILLEEPSSSGGSGQLPDRPQPSLAAVQWLQCCLESFWSLE                     LSPKEYACLKGTILFNPDVPGLQAASHIGHLQQEAHWVLCEVLEPWCPAAQGRLTRVL
LTASTLKSIPTSLLGDLFFRPIIGDVDIAGLLGDMLLLR

Python script:

Entrez.email = 'myemail@gmail.com'

handle = Entrez.efetch(db='nuccore', id='NM_021969.2', rettype='fasta')

print(handle.read())

I would appreciate some help if anyone has succeeded.

Yuval

ncbi python biopython • 1.1k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 17 months ago by יובל • 0

score 2 · Answer 1 · 2022-11-13

2

Entering edit mode

17 months ago

iraun 6.2k

Hi! Welcome to Biostars :).

Try this:

Entrez.efetch(db="protein", id='NM_021969.2',  rettype="fasta")

ADD COMMENT • link 17 months ago by iraun 6.2k

0

Entering edit mode

Thanks for your help, it solved my problem.

ADD REPLY • link 17 months ago by יובל • 0

0

Entering edit mode

A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

upvote_bookmark_accept

ADD REPLY • link 16 months ago by Ram 43k

score 0 · Answer 2 · 2022-11-21

Hi,
You can use NCBI Datasets. To download the protein sequence associated with this nucleotide record, you can use the following command:

datasets download gene accession NM_021969.2 --include protein

This command will download a zip file, with the following contents:

ncbi_dataset
`-- data
    |-- data_report.jsonl
    |-- dataset_catalog.json
    `-- protein.faa

By default, NCBI Datasets gene data package includes transcript and protein sequences, as well as metadata as JSON-Lines. You can include other files (if available) using the flag --include, as exemplified above.

Feel free to reach out if you have any additional questions. I hope it helps :)