Hello,
I would like to bulk download all available Genpept files that include the fields like LOCUS, ACCESSION, DEFINITION, VERSION, DBSOURCE, KEYWORDS, and if possible publication information (A list of relevant pubmed entries for example would be amazing). A screenshot of such a file is shown here: https://www.genbeans.org/ibe/5.3/help/org-genbeans-modules-seqfiles/working_genpept.html#:~:text=GenPept%20is%20a%20database%20of,each%20new%20release%20of%20GenBank.
Can I bulk download them from somewhere? I got the nucleotide versions of these files from NCBI genebank files. I found them here: https://ftp.ncbi.nlm.nih.gov/genbank/
Is there a peptide version of them available somewhere?
Thank you!
No bulk download available as far as I know. Your closest bet is going to convert the GenBank files you have using a program like
emboss seqret
. Curios as to why you need the GenPept format specifically.I suppose what I really want is to have a peptide accession id and be able to quickly (within my program) read all about how it was added to the database. Where did it come from? Who/what program discovered it? Are there publications on pubmed that reference it? that sort of thing. And I want this for millions of accession ids, if not all of them.
I don't really care about displaying the peptide sequence, or what format it is in. It could be any format invented by humans. I can parse anything.
Is there a way to download this information in bulk somewhere?