Question: Extracting loci index from uniprot with high confidence
0
gravatar for olle.nordesjo
5.0 years ago by
Singapore
olle.nordesjo0 wrote:

I'm in the process of automating an assembly of paired multiple sequence alignments (i.e. MSAs with two proteins aligned after each other) in order to do some paired sequence processing on them.

In order to do this, I'm querying two protein families from pfam, and I'm trying to associate them.

I've understood that asserting that the genomic locations of the two proteins are adjacent is appropriate for associating them with high confidence in my case (since they're situated in the same operon)

So, this is the question:

Given the xml-information in uniprot, what is the best way to assert the genomic proximity/adjacency? (Can I find it easily using the BioPython API for example?)

In the case that the specific loci index information is lacking (i suspect this is often the case) it appropriate to compare the uniprot identifiers (e.g. K0D1W6 in http://www.uniprot.org/uniprot/K0D1W6.xml) for similarity using some measure?

 

Thanks!

 

Hope to get some intelligent mind out there to help me. I'd be forever grateful!

 

api uniprot loci genome • 1.1k views
ADD COMMENTlink modified 5.0 years ago by Elisabeth Gasteiger1.7k • written 5.0 years ago by olle.nordesjo0
2
gravatar for Elisabeth Gasteiger
5.0 years ago by
Geneva
Elisabeth Gasteiger1.7k wrote:

The letters in UniProtKB accession numbers have absolutely no meaning. The role of an accession number is to uniquely identify an entry, and that's all. Have a look at the user manual section about accession numbers: http://www.uniprot.org/manual/accession_numbers

You may however use the ordered locus names in the gene name field to assert genomic proximity: We call ‘Ordered locus name’ (OLN) the naming systems that are used to sequentially assign an identifier to each predicted gene of a completely sequenced genome or chromosome. The OLN is generally based on a prefix representing the organism followed by a number which usually represents the sequential ordering of genes on the chromosome. Depending on the genome-sequencing center, OLNs are only attributed to protein-coding genes, or also to pseudogenes, and also to tRNA-coding genes and others. If two predicted genes have been merged to form a new gene, both OLNs are indicated, separated by a slash.

see http://www.uniprot.org/help/gene_name

In your example:

<gene>
  <name type="ordered locus" evidence="10">AMBLS11_16600</name>
</gene>

 

ADD COMMENTlink written 5.0 years ago by Elisabeth Gasteiger1.7k

Thanks a lot!

That helps, especially in the case of merged genes. However, are there certain standards for exactly what the ordered locus name format should contain? Reason being, I'm stumbling upon all of these really weird looking OLNs, for example:

POPTR_0007s09720g

POPTR_0010s11350g

Coming from the same organism. It seems slightly messy

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by olle.nordesjo0
1

The identifiers you saw seem to be ORFnames in UniProtKB, not OLNs, suggesting that they are not really ordered:

e.g. http://www.uniprot.org/uniprot/U5G7Y3

OLNs are included in UniProtKB only if they were attributed by the group that sequenced the genome.

ADD REPLYlink written 5.0 years ago by Elisabeth Gasteiger1.7k

Thanks, that clarifies it!

ADD REPLYlink written 5.0 years ago by olle.nordesjo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1598 users visited in the last hour