Question

Proteins without genes ? Is that even possible ?

3

Entering edit mode

9.1 years ago

Khader Shameer 18k

Hello all,

I am looking at some mass-spec data. I found several fragments mapping to Ig heavy chain V-II region WAH protein and want to find corresponding gene.

Example http://www.uniprot.org/uniprot/P01770

Uniprot Screenshot

Uniprot says the gene name as "NULL". Is this an annotation error or any special aspect of Ig regions am missing? I want to map several proteins with these type of names to genes.

Cluster of Ig heavy chain V-I region HG3
Cluster of Ig heavy chain V-II region SESS
Cluster of Ig heavy chain V-III region BRO
Cluster of Ig lambda chain V-I region NEW
Cluster of Ig lambda chain V-II region BUR
Ig heavy chain V-II region WAH
Ig heavy chain V-III region BUT
Ig heavy chain V-III region GAL
Ig heavy chain V-III region NIE
Ig heavy chain V-III region WEA
Ig kappa chain V-I region Kue
Ig kappa chain V-I region Wes
Ig kappa chain V-III region VG (Fragment)
Ig lambda chain V-III region LOI
Ig lambda chain V-III region SH
Ig lambda chain V-V region DEL

How can I map these to corresponding gene names?

Thoughts?

annotation uniprot NCBI • 3.9k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by Khader Shameer 18k

3

Entering edit mode

9.1 years ago

Emily 23k

I think it comes down to the fact that Ig proteins don't have "a gene", they are made up of bits of gene that undergo somatic recombination. Wikipedia is surprisingly good on it.

In Ensembl, we have Ig genes (example) which are just the smaller gene fragments. They link out to protein fragments in Uniprot, which do have a gene linked to them. However the longer ones, as you see, don't have a gene because they're not single genes.

Might be worth an email to Uniprot to see if they might consider linking their long Igs to the shorter Ig fragments, and therefore the genes.

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by Emily 23k

0

Entering edit mode

Thanks Emily!

Uniprot folks are here, yay!

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Khader Shameer 18k

3

Entering edit mode

9.1 years ago

mikhail.shugay 3.5k

Immunoglobulin genes are created from Variable, Diversity and Joining segments via a process called V-D-J recombination. They are unique to each B-cell, unless those cells result from a proliferation of the same ancestor. Moreover, random nucleotides are inserted in V-D and D-J junctions and the sequence is diversified by somatic hypermutations, so it is only possible to get the gene sequence by de-novo sequencing and assembly.

As far as I know, Pevzner lab has some nice algorithm for getting antibody repertoire from mass-spec data (e.g. http://online.liebertpub.com/doi/abs/10.1089/106652799318300 ).

You can also try out IgBlast: set the algorithm to blastp, and explore the alignment to V/D/J reference sequences.

PS. Note that there both individual gene ids for those segments (IGHV1-69, ...) and an entire locus (IGH@) are present, e.g. http://www.ncbi.nlm.nih.gov/gene/28461

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 9.1 years ago by mikhail.shugay 3.5k

2

Entering edit mode

9.1 years ago

karl.stamm 4.1k

Doesn't have a gene, so it doesn't have a gene. An Ig heavy chain is possibly a variable construct.

Depending on what analysis you need to do, there might be a stand-in gene name that will work.

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by karl.stamm 4.1k

2

Entering edit mode

9.1 years ago

Elisabeth Gasteiger ★ 2.4k

Just another note about UniProtKB & Immunoglobulins:

As described here http://www.uniprot.org/help/uniprotkb_coverage, UniProtKB excludes the protein sequences from most non-germline immunoglobulins and T-cell receptors.

ADD COMMENT • link 9.1 years ago by Elisabeth Gasteiger ★ 2.4k

1

Entering edit mode

9.1 years ago

Michael 54k

I think this can happen technically, but of course there must be an ORF in some B cell being expressed into polypeptide, however this sequence cannot be mapped to the reference genome. Protein sequences could have been acquired by de-novo sequencing using MS.

The sequences you are checking are variable chains, the sequence is required for antigen binding and therefore highly variable.

ADD COMMENT • link 9.1 years ago by Michael 54k

0

Entering edit mode

Thanks Michael!

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Khader Shameer 18k

1

Entering edit mode

9.1 years ago

5heikki 11k

While not related to humans (afaik), you can also have proteins without ORFs due to mRNA free nonribosomal protein synthesis. Anyway, I believe Michael Dondrup answered your question correctly above..

ADD COMMENT • link 9.1 years ago by 5heikki 11k

0

Entering edit mode

Fascinating, thanks for sharing this.

ADD REPLY • link 9.1 years ago by Khader Shameer 18k

Ram · Accepted Answer · 2015-03-27

7

Entering edit mode

9.1 years ago

Elisabeth Gasteiger ★ 2.4k

From the UniProt helpdesk (please contact us directly with future questions):
There seem to be 281 human UniProtKB/Swiss-Prot entries (http://www.uniprot.org/uniprot/?query=+NOT+gene%3A*+AND+organism%3A%22Homo+sapiens+%28Human%29+[9606]%22&sort=score) that do not have any gene name. This figure represents less than 1.4% of the total, which is not too bad. Sequences that have been exclusively produced by large scale sequencing programs, such as NEDO, for instance, usually do not have any satisfactory name, but rather a clone name, such as FLJxxx in the case of the NEDO project. We do not consider these names as eligible for gene names and there are no publications that could give us a hint to name these sequences properly.
There are about 65 entries of the FLJxxx type in UniProtKB/Swiss-Prot.
Although I didn't look at all entries that do not have any gene name, I expect that most follow the same rationale. At some point, these entries will be either "upgraded" if some additional evidences are produced by the scientific community, or deleted from the database. However, UniProtKB/Swiss-Prot is quite conservative and we prefer to keep some dubious sequences. This provides MS users with the largest possible set of peptides for their identifications.

There are also human entries that do not have any official gene name. Currently 356 human entries do not have any link to HGNC. This represents less than 2% of the total number of human entries. Some may belong to the group of orphan sequences that do not seem to interest anyone for the time being (see above), or they may be missing from HGNC (see for instance Ovostatin homologs, Q6IE37 and Q6IE36).

ADD COMMENT • link 9.1 years ago by Elisabeth Gasteiger ★ 2.4k

1

Entering edit mode

That's awesome you're participating here, and your answer is excellent.

But that illustrates why I disagree with your statement "please contact us directly with future questions." Instead of one person getting his question answered, now many people can get an authoritative answer without you and your coworkers having to answer the same question again.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Dan D 7.4k

5

Entering edit mode

We try to monitor UniProt related questions here and are happy to answer, but cannot guarantee that we see all of them. A quick email to the helpdesk even if it is just to say "I have posted this UniProt question on BioStars, with a link to the post", increases your chances of receiving a reply.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Elisabeth Gasteiger ★ 2.4k

0

Entering edit mode

Right on. Thanks for the clarification!

ADD REPLY • link 9.1 years ago by Dan D 7.4k

0

Entering edit mode

Great answer and that clarifies it. Thanks for reaching out to us via Biostars.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Khader Shameer 18k