Downloading all UniProt dataset
2
0
Entering edit mode
6.5 years ago

I have to download all UniProt dataset (txt files). Does anybody know is it possible to download whole dataset with txt extension, but without all informations about protein. I only need info about ID, Name, Function, Sequence, Structure and Resolution? So it would look something like this for only one UniProt ID. Thanks in advance

ID   AQP1_HUMAN              Reviewed;         269 AA.
AC   P29972; B5BU39; E7EM69; E9PC21; F5GY19; Q8TBI5; Q8TDC1;
DE   RecName: Full=Aquaporin-1;
DE            Short=AQP-1;
DE   AltName: Full=Aquaporin-CHIP;
DE   AltName: Full=Urine water channel;
DE   AltName: Full=Water channel protein for red blood cells and kidney proximal tubule;
DR   PDB; 1FQY; X-ray; 3.80 A; A=1-269.
DR   PDB; 1H6I; X-ray; 3.54 A; A=1-269.
DR   PDB; 1IH5; X-ray; 3.70 A; A=1-269.
DR   PDB; 4CSK; X-ray; 3.28 A; A=1-269.
SQ   SEQUENCE   269 AA;  28526 MW;  BA204D82FB26352E CRC64;
     MASEFKKKLF WRAVVAEFLA TTLFVFISIG SALGFKYPVG NNQTAVQDNV KVSLAFGLSI
     ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS IFRALMYIIA QCVGAIVATA ILSGITSSLT
     GNSLGRNDLA DGVNSGQGLG IEIIGTLQLV LCVLATTDRR RRDLGGSAPL AIGLSVALGH
     LLAIDYTGCG INPARSFGSA VITHNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDLTD
     RVKVWTSGQV EEYDLDADDI NSRVEMKPK
Assembly • 2.6k views
ADD COMMENT
2
Entering edit mode
6.5 years ago
GenoMax 141k

Take a look at the README file at this Uniprot FTP site. Then choose the file you need.

ADD COMMENT
0
Entering edit mode

oh I see..Thanks man!

ADD REPLY
2
Entering edit mode
6.4 years ago

You could download the complete flat file (either from the UniProt FTP site, or from your query result page on the website) and then use grep or some scripting language to keep only these line types. If you need assistance with regular expressions to obtain exactly this data, please don't hesitate to contact the UniProt helpdesk. The flat file format is documented at http://www.uniprot.org/docs/userman.htm

UniProt offer tab-delimited download from the website (http://www.uniprot.org/help/customize, http://insideuniprot.blogspot.ch/2015_03_01_archive.html)

This would work perfectly in your case, keeping columns for identifiers, protein names, PDB cross-references and sequence. However, we do unfortunately have a limitation for tab-separated cross-reference download: While the html version of the result table contains the full cross-reference information including PDB method and resolution, the tab-separated download only contains the identifier, and excludes the other information on these lines.

We are looking into changing this, although there are some issues (separators, line length as there are entries with more than 500 PDB cross-references, etc).

ADD COMMENT

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6