Proteins without genes ? Is that even possible ?
7
3
Entering edit mode
9.1 years ago

Hello all,

I am looking at some mass-spec data. I found several fragments mapping to Ig heavy chain V-II region WAH protein and want to find corresponding gene.

Example http://www.uniprot.org/uniprot/P01770

Uniprot Screenshot

Uniprot says the gene name as "NULL". Is this an annotation error or any special aspect of Ig regions am missing? I want to map several proteins with these type of names to genes.

  • Cluster of Ig heavy chain V-I region HG3
  • Cluster of Ig heavy chain V-II region SESS
  • Cluster of Ig heavy chain V-III region BRO
  • Cluster of Ig lambda chain V-I region NEW
  • Cluster of Ig lambda chain V-II region BUR
  • Ig heavy chain V-II region WAH
  • Ig heavy chain V-III region BUT
  • Ig heavy chain V-III region GAL
  • Ig heavy chain V-III region NIE
  • Ig heavy chain V-III region WEA
  • Ig kappa chain V-I region Kue
  • Ig kappa chain V-I region Wes
  • Ig kappa chain V-III region VG (Fragment)
  • Ig lambda chain V-III region LOI
  • Ig lambda chain V-III region SH
  • Ig lambda chain V-V region DEL

How can I map these to corresponding gene names?

Thoughts?

annotation uniprot NCBI • 3.9k views
ADD COMMENT
7
Entering edit mode
9.1 years ago

From the UniProt helpdesk (please contact us directly with future questions):
There seem to be 281 human UniProtKB/Swiss-Prot entries (http://www.uniprot.org/uniprot/?query=+NOT+gene%3A*+AND+organism%3A%22Homo+sapiens+%28Human%29+[9606]%22&sort=score) that do not have any gene name. This figure represents less than 1.4% of the total, which is not too bad. Sequences that have been exclusively produced by large scale sequencing programs, such as NEDO, for instance, usually do not have any satisfactory name, but rather a clone name, such as FLJxxx in the case of the NEDO project. We do not consider these names as eligible for gene names and there are no publications that could give us a hint to name these sequences properly.
There are about 65 entries of the FLJxxx type in UniProtKB/Swiss-Prot.
Although I didn't look at all entries that do not have any gene name, I expect that most follow the same rationale. At some point, these entries will be either "upgraded" if some additional evidences are produced by the scientific community, or deleted from the database. However, UniProtKB/Swiss-Prot is quite conservative and we prefer to keep some dubious sequences. This provides MS users with the largest possible set of peptides for their identifications.

There are also human entries that do not have any official gene name. Currently 356 human entries do not have any link to HGNC. This represents less than 2% of the total number of human entries. Some may belong to the group of orphan sequences that do not seem to interest anyone for the time being (see above), or they may be missing from HGNC (see for instance Ovostatin homologs, Q6IE37 and Q6IE36).

ADD COMMENT
1
Entering edit mode

That's awesome you're participating here, and your answer is excellent.

But that illustrates why I disagree with your statement "please contact us directly with future questions." Instead of one person getting his question answered, now many people can get an authoritative answer without you and your coworkers having to answer the same question again.

ADD REPLY
5
Entering edit mode

We try to monitor UniProt related questions here and are happy to answer, but cannot guarantee that we see all of them. A quick email to the helpdesk even if it is just to say "I have posted this UniProt question on BioStars, with a link to the post", increases your chances of receiving a reply.

ADD REPLY
0
Entering edit mode

Right on. Thanks for the clarification!

ADD REPLY
0
Entering edit mode

Great answer and that clarifies it. Thanks for reaching out to us via Biostars.

ADD REPLY
3
Entering edit mode
9.1 years ago
Emily 23k

I think it comes down to the fact that Ig proteins don't have "a gene", they are made up of bits of gene that undergo somatic recombination. Wikipedia is surprisingly good on it.

In Ensembl, we have Ig genes (example) which are just the smaller gene fragments. They link out to protein fragments in Uniprot, which do have a gene linked to them. However the longer ones, as you see, don't have a gene because they're not single genes.

Might be worth an email to Uniprot to see if they might consider linking their long Igs to the shorter Ig fragments, and therefore the genes.

ADD COMMENT
0
Entering edit mode

Thanks Emily!

Uniprot folks are here, yay!

ADD REPLY
3
Entering edit mode
9.1 years ago

Immunoglobulin genes are created from Variable, Diversity and Joining segments via a process called V-D-J recombination. They are unique to each B-cell, unless those cells result from a proliferation of the same ancestor. Moreover, random nucleotides are inserted in V-D and D-J junctions and the sequence is diversified by somatic hypermutations, so it is only possible to get the gene sequence by de-novo sequencing and assembly.

As far as I know, Pevzner lab has some nice algorithm for getting antibody repertoire from mass-spec data (e.g. http://online.liebertpub.com/doi/abs/10.1089/106652799318300 ).

You can also try out IgBlast: set the algorithm to blastp, and explore the alignment to V/D/J reference sequences.

PS. Note that there both individual gene ids for those segments (IGHV1-69, ...) and an entire locus (IGH@) are present, e.g. http://www.ncbi.nlm.nih.gov/gene/28461

ADD COMMENT
2
Entering edit mode
9.1 years ago

Doesn't have a gene, so it doesn't have a gene. An Ig heavy chain is possibly a variable construct.

Depending on what analysis you need to do, there might be a stand-in gene name that will work.

ADD COMMENT
2
Entering edit mode
9.1 years ago

Just another note about UniProtKB & Immunoglobulins:

As described here http://www.uniprot.org/help/uniprotkb_coverage, UniProtKB excludes the protein sequences from most non-germline immunoglobulins and T-cell receptors.

ADD COMMENT
1
Entering edit mode
9.1 years ago
Michael 54k

I think this can happen technically, but of course there must be an ORF in some B cell being expressed into polypeptide, however this sequence cannot be mapped to the reference genome. Protein sequences could have been acquired by de-novo sequencing using MS.

The sequences you are checking are variable chains, the sequence is required for antigen binding and therefore highly variable.

ADD COMMENT
0
Entering edit mode

Thanks Michael!

ADD REPLY
1
Entering edit mode
9.1 years ago
5heikki 11k

While not related to humans (afaik), you can also have proteins without ORFs due to mRNA free nonribosomal protein synthesis. Anyway, I believe Michael Dondrup answered your question correctly above..

ADD COMMENT
0
Entering edit mode

Fascinating, thanks for sharing this.

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6