How do I get data that maps the protein domain of the Interpro database to the hg19 genome coordinates?
1
1
Entering edit mode
6.8 years ago
eric.kai0918 ▴ 10

Hi,

How do I get data that maps the protein domain of the Interpro database to the hg19 genome coordinates? For example, I want to these format; Chromosome, StartPosition, EndPosition, ProteinDomain

Thanks.

Interpro Domain • 2.1k views
ADD COMMENT
2
Entering edit mode
6.8 years ago

I've written http://lindenb.github.io/jvarkit/MapUniProtFeatures.html

but I've not used it since I've written it.

$ java  -jar dist/mapuniprot.jar \
    -R /path/to/human_g1k_v37.fasta \
    -u /path/uri/uniprot.org/uniprot_sprot.xml.gz  \
    -k <(curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" | gunzip -c | awk -F '        ' '{if($2 ~ ".*_.*") next; OFS="       "; gsub(/chr/,"",$2);print;}'   ) |\
    LC_ALL=C sort -t '  ' -k1,1 -k2,2n -k3,3n  | uniq | head


1   69090   69144   topological_domain  1000    +   69090   69144   255,0,0 1   54  0
1   69144   69216   transmembrane_region    1000    +   69144   69216   255,0,0 1   72  0
1   69216   69240   topological_domain  1000    +   69216   69240   255,0,0 1   24  0
1   69240   69306   transmembrane_region    1000    +   69240   69306   255,0,0 1   66  0
1   69306   69369   topological_domain  1000    +   69306   69369   255,0,0 1   63  0
1   69357   69636   disulfide_bond  1000    +   69357   69636   255,0,0 1   279 0
1   69369   69429   transmembrane_region    1000    +   69369   69429   255,0,0 1   60  0
1   69429   69486   topological_domain  1000    +   69429   69486   255,0,0 1   57  0
1   69486   69543   transmembrane_region    1000    +   69486   69543   255,0,0 1   57  0
1   69543   69654   topological_domain  1000    +   69543   69654   255,0,0 1   111 0
ADD COMMENT
0
Entering edit mode

Thank you for your answer.

ADD REPLY
0
Entering edit mode

I notice both fasta and knowngene are based on hg19. Should I be careful about the version of uniprot_sprot.xml.gz?

I hope uniprot_sprot.xml.gz is hg19 and hg38 independent.

ADD REPLY

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6