Question: Proteins without genes ? Is that even possible ?
3
gravatar for Khader Shameer
4.4 years ago by
Manhattan, NY
Khader Shameer18k wrote:

Hello all,

I am looking at some mass-spec data. 
I found several fragments mapping to Ig heavy chain V-II region WAH protein and want to find corresponding gene.

Example http://www.uniprot.org/uniprot/P01770 

Uniprot Screenshot
Uniprot says the gene name as "NULL". Is this an annotation error or any special aspect of Ig regions am missing ? I want to map several proteins with these type of names to genes. 

  • Cluster of Ig heavy chain V-I region HG3
  • Cluster of Ig heavy chain V-II region SESS
  • Cluster of Ig heavy chain V-III region BRO
  • Cluster of Ig lambda chain V-I region NEW
  • Cluster of Ig lambda chain V-II region BUR
  • Ig heavy chain V-II region WAH
  • Ig heavy chain V-III region BUT
  • Ig heavy chain V-III region GAL 
  • Ig heavy chain V-III region NIE
  • Ig heavy chain V-III region WEA
  • Ig kappa chain V-I region Kue
  • Ig kappa chain V-I region Wes
  • Ig kappa chain V-III region VG (Fragment)
  • Ig lambda chain V-III region LOI 
  • Ig lambda chain V-III region SH 
  • Ig lambda chain V-V region DEL

How can I map these to corresponding gene names ? 

Thoughts ? 

uniprot annotation ncbi • 2.1k views
ADD COMMENTlink modified 4.4 years ago by mikhail.shugay3.3k • written 4.4 years ago by Khader Shameer18k
7
gravatar for Elisabeth Gasteiger
4.4 years ago by
Geneva
Elisabeth Gasteiger1.6k wrote:

From the UniProt helpdesk (please contact us directly with future questions):
There seem to be 281 human UniProtKB/Swiss-Prot entries (http://www.uniprot.org/uniprot/?query=+NOT+gene%3A*+AND+organism%3A%22Homo+sapiens+%28Human%29+[9606]%22&sort=score) that do not have any gene name. This figure represents less than 1.4% of the total, which is not too bad. Sequences that have been exclusively produced by large scale sequencing programs, such as NEDO, for instance, usually do not have any satisfactory name, but rather a clone name, such as FLJxxx in the case of the NEDO project. We do not consider these names as eligible for gene names and there are no publications that could give us a hint to name these sequences properly.
There are about 65 entries of the FLJxxx type in UniProtKB/Swiss-Prot.
Although I didn't look at all entries that do not have any gene name, I expect that most follow the same rationale. At some point, these entries will be either "upgraded" if some additional evidences are produced by the scientific community, or deleted from the database. However, UniProtKB/Swiss-Prot is quite conservative and we prefer to keep some dubious sequences. This provides MS users with the largest possible set of peptides for their identifications.

There are also human entries that do not have any official gene name. Currently 356 human entries do not have any link to HGNC. This represents less than 2% of the total number of human entries. Some may belong to the group of orphan sequences that do not seem to interest anyone for the time being (see above), or they may be missing from HGNC (see for instance Ovostatin homologs, Q6IE37 and Q6IE36).

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Elisabeth Gasteiger1.6k
1

That's awesome you're participating here, and your answer is excellent.

But that illustrates why I disagree with your statement "please contact us directly with future questions." Instead of one person getting his question answered, now many people can get an authoritative answer without you and your coworkers having to answer the same question again. 

ADD REPLYlink written 4.4 years ago by Dan D6.8k
5

We try to monitor UniProt related questions here and are happy to answer, but cannot garantee that we see all of them. A quick email to the helpdesk even if it is just to say "I have posted this UniProt question on BioStars, with a link to the post", increases your chances of receiving a reply.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Elisabeth Gasteiger1.6k

Right on. Thanks for the clarification!

ADD REPLYlink written 4.4 years ago by Dan D6.8k

Great answer and that clarifies it. Thanks for reaching out to us via Biostars. 

ADD REPLYlink written 4.4 years ago by Khader Shameer18k
3
gravatar for Emily_Ensembl
4.4 years ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

I think it comes down to the fact that Ig proteins don't have "a gene", they are made up of bits of gene that undergo somatic recombination. Wikipedia is surprisingly good on it.

In Ensembl, we have Ig genes (for example) which are just the smaller gene fragments. They link out to protein fragments in Uniprot, which do have a gene linked to them. However the longer ones, as you see, don't have a gene because they're not single genes.

Might be worth an email to Uniprot to see if they might consider linking their long Igs to the shorter Ig fragments, and therefore the genes.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Emily_Ensembl18k

Thanks Emily! 
Uniprot folks are here, yay! 

ADD REPLYlink written 4.4 years ago by Khader Shameer18k
3
gravatar for mikhail.shugay
4.4 years ago by
mikhail.shugay3.3k
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

Immunoglobulin genes are created from Variable, Diversity and Joining segments via a process called V-D-J recombination. They are unique to each B-cell, unless those cells result from a proliferation of the same ancestor. Moreover, random nucleotides are inserted in V-D and D-J junctions and the sequence is diversified by somatic hypermutations, so it is only possible to get the gene sequence by de-novo sequencing and assembly.

As far as I know, Pevzner lab has some nice algorithm for getting antibody repertoire from mass-spec data (e.g. http://online.liebertpub.com/doi/abs/10.1089/106652799318300 ).

You can also try out IgBlast: set the algorithm to blastp, and explore the alignment to V/D/J reference sequences.

PS. Note that there both individual gene ids for those segments (IGHV1-69, ...) and an entire locus (IGH@) are present, e.g. http://www.ncbi.nlm.nih.gov/gene/28461

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by mikhail.shugay3.3k
2
gravatar for karl.stamm
4.4 years ago by
karl.stamm3.5k
United States
karl.stamm3.5k wrote:

Doesn't have a gene, so it doesn't have a gene.  An Ig heavy chain  is possibly a variable construct.  

Depending on what analysis you need to do, there might be a stand-in gene name that will work.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by karl.stamm3.5k
2
gravatar for Elisabeth Gasteiger
4.4 years ago by
Geneva
Elisabeth Gasteiger1.6k wrote:

Just another note about UniProtKB & Immunoglobulins:

As described here http://www.uniprot.org/help/uniprotkb_coverage, UniProtKB excludes the protein sequences from most non-germline immunoglobulins and T-cell receptors.

ADD COMMENTlink written 4.4 years ago by Elisabeth Gasteiger1.6k
1
gravatar for Michael Dondrup
4.4 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

I think this can happen technically, but of course there must be an ORF in some B cell being expressed into polypeptide, however this sequence cannot be mapped to the reference genome. Protein sequences could have been acquired by de-novo sequencing using MS.

The sequences you are checking are variable chains, the sequence is required for antigen binding and therefore highly variable.

ADD COMMENTlink written 4.4 years ago by Michael Dondrup46k

Thanks Michael! 

ADD REPLYlink written 4.4 years ago by Khader Shameer18k
1
gravatar for 5heikki
4.4 years ago by
5heikki8.5k
Finland
5heikki8.5k wrote:

While not related to humans (afaik), you can also have proteins without ORFs due to mRNA free nonribosomal protein synthesis. Anyway, I believe Michael Dondrup answered your question correctly above..

ADD COMMENTlink written 4.4 years ago by 5heikki8.5k

Fascinating, thanks for sharing this.

ADD REPLYlink written 4.4 years ago by Khader Shameer18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 672 users visited in the last hour