Uniprot In Gff3 Format
3
2
Entering edit mode
13.2 years ago
User 9996 ▴ 840

Where can I find valid versions of Uniprot database (for all isoforms of all genes) in GFF3 format? I'm interested in this for hg18/hg19 and mm9. Thanks.

uniprot gff gene • 3.9k views
ADD COMMENT
3
Entering edit mode
13.2 years ago
Jerven ▴ 660

Building on Pierre's answer you can then get each uniprot record in gff using

http://www.uniprot.org/uniprot/THE_ID_YOU_FOUND.gff

One by one. Or using batch retrieve to get the entries in one go. Then look for the small link back to uniprot and then download the uniprot entries using the orange download button in gff.

This is gff but not 100% gff3 as the Sequence Ontology does not have all UniProt features so they can't be described with 100% valid gff3. Which makes it rather hard for UniProt to be encoded in GFF3.

ADD COMMENT
2
Entering edit mode
13.2 years ago
  • Go to UCSC table browser
  • Select mammal/human/hg18
  • Select genes/UCSC genes/knwonGenes
  • Output format: BED
  • Get output

The column proteinID should be the Uniprot-ID

ADD COMMENT
2
Entering edit mode
13.2 years ago

By taking advantage of Pierre's tip, you'll just need to get the ID list here.

With the list in hand, remove all header/RefSeq things and the second column with:

cat hgTables | grep -v "NP_" | awk '{print $1}' > hgTablesUniProt

Then, get your files (Beware! Loooong list!):

while read line; do wget http://www.uniprot.org/uniprot/$line.gff done < hgTablesUniProt

As Pierre says: That's it!

Just to mention, I've assumed a bash shell in hand. And I think a delay in wget could be polite.

ADD COMMENT

Login before adding your answer.

Traffic: 901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6