Question: Why am I getting different ensembl gene ids for a given gene symbol?
4
gravatar for ravihansa82
4.4 years ago by
ravihansa8260
Sri Lanka
ravihansa8260 wrote:

dear friends

I have set of gene symbols. when I convert such symbols to appropriate ensembl gene ids, it gave me different gene ids for a given gene symbol instead of one gene id for a given gene symbol. why is this happen?

ensembl sequence gene genome • 8.0k views
ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 4.4 years ago by ravihansa8260
1

Can you give us an example please?

ADD REPLYlink written 4.4 years ago by Emily_Ensembl17k

thank you for the reply. here is an example.

for example, I used "AGPAT1" gene symbol. I converted this gene symbol to ensemble gene ID using online BIoMart tool. It gave me seven different Ensembl Gene IDs as follows.

HGNC symbol Ensembl Gene ID
AGPAT1 ENSG00000228892
AGPAT1 ENSG00000235758
AGPAT1 ENSG00000227642
AGPAT1 ENSG00000204310
AGPAT1 ENSG00000236873
AGPAT1 ENSG00000226467
AGPAT1 ENSG00000206324

 

 

ADD REPLYlink written 4.4 years ago by ravihansa8260

yes friend I did it careful selection of taxon. the problem was I need to extract some intron sequence from set of genes. once I convert such gene symvols to Ensembl Gene IDs some genes end up with giving different Ensemble Gene IDs for some given gene symbols.

 

ADD REPLYlink written 4.4 years ago by ravihansa8260
13
gravatar for Emily_Ensembl
4.4 years ago by
Emily_Ensembl17k
EMBL-EBI
Emily_Ensembl17k wrote:

The gene you're looking at, AGPAT1, is found on a haplotypic region. Haplotypes are regions of the genome which have two or more versions, which we find in full in different individuals. These may have the same genes in a different order, or even different genes. We have a help video explaining this here.

AGPAT1 is found in the haplotypic MHC region, of which there are nine possible versions of the genome, and it is found in seven of those nine. You can see all the possible Ensembl IDs for the different versions of AGPAT1 here.

ADD COMMENTlink written 4.4 years ago by Emily_Ensembl17k
1

This could be a problem if there are multiple "gene_id" for same "gene_name" with the quantification of RNA-Seq data using htseq/featureCounts as the reads will fall under ambiguous category i.e they overlap multiple genes.

ADD REPLYlink written 3.1 years ago by geek_y9.3k

I got this problem and don't know how to process it.

ADD REPLYlink written 6 months ago by Shixiang30

Nice video - so approximately how may proteins are multiplexed in this way ?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by cdsouthan1.8k
2

In the current database, 661. Some will only have two members, others like AGPAT1 have lots. One haplotype set has 36 different versions on chromosome 19.

At the moment the current human genome, GRCh38, only has haplotypes, but GRC has already started making patches to repair misassembled or gapped genomic regions. We will bring these in and annotate them so we'll be looking at more duplicate genes, however in the case of patches, the gene on the patch is good and the gene on the primary is dodgy. This is different to haplotypes where all genes are equally valid.

ADD REPLYlink written 4.4 years ago by Emily_Ensembl17k

thank you for the explanation. It means certain gene fall in to haplotypic region have different version of same genes and each different versions are named by different Ensembl gene IDs am I correct?

 

ADD REPLYlink modified 4.4 years ago by Emily_Ensembl17k • written 4.4 years ago by ravihansa8260
1

That is correct.

ADD REPLYlink written 4.4 years ago by Emily_Ensembl17k

Dear Emily,

I need one more explanation.I extracted first intron of a gene which fall in to haplotype region. suppose it produces seven haplotypes hence I got 7 first intron sequences. considering the sequence length, 4 out of 7 had same length.but rest of the sequences in different lengths. can I consider latter sequences in such haplotype region as different gene?   

ADD REPLYlink written 4.3 years ago by ravihansa8260
1

The haplotypes are different to each other. Expansion/contraction of an intron between haplotypes is unsurprising. I would consider them to be the same gene if the cDNA is the same, not the introns.

ADD REPLYlink written 4.3 years ago by Emily_Ensembl17k

@Emily can you pls suggest if the two or more ensembl ids (ultimately also their sequences ) can be used interchangably

Thanks in advance for the reply

ADD REPLYlink written 20 months ago by lakshmi.bioinformatics20

It depends how different they are and what you need to do with them. I'd put the cDNA sequences into CLUSTAL and see if they are interchangable or not.

ADD REPLYlink written 20 months ago by Emily_Ensembl17k

if we have "genes with the same hogu ids but different ensemble id" does it make sense to add up the raw count of those? ( for RNA expression or single cell analysis). Does it make sense to treat them as isoforms?

ADD REPLYlink written 5 months ago by rsafavi40

You should make a new post on BioStars for this question – you'll get a lot more answers.

ADD REPLYlink written 5 months ago by Emily_Ensembl17k
2
gravatar for Brice Sarver
4.4 years ago by
Brice Sarver2.5k
United States
Brice Sarver2.5k wrote:

You may be receiving IDs from other species, like NCBI's BRCA1 example. Impossible to tell without more information.

ADD COMMENTlink written 4.4 years ago by Brice Sarver2.5k

thank you for the reply.

I used HGNC gene sybmols. For example  I converted this gene symbol AGPAT1 , to Ensembl Gene ID using online BioMart tool. As a result it gave me seven different Ensembl Gene IDs as follows. 

 

HGNC symbol Ensembl Gene ID
AGPAT1 ENSG00000228892
AGPAT1 ENSG00000235758
AGPAT1 ENSG00000227642
AGPAT1 ENSG00000204310
AGPAT1 ENSG00000236873
AGPAT1 ENSG00000226467
AGPAT1 ENSG00000206324
ADD REPLYlink written 4.4 years ago by ravihansa8260

It is more possible. Example, Y_RNA gene name has different ENSG's and also each located in different chromosomes (chr1,3,4,12,14,20,X).

That's the reason whenever someone starts the analysis take one transcript/gene annotaion into account example, Gencode or Ensembl. Also consider ENGSs are reference ids till the end of your analysis (to avoid redundent ids, example gene name/symbols).

ADD REPLYlink written 20 months ago by EagleEye6.2k
2
gravatar for EagleEye
4.4 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:
When you map ids always careful in choosing right taxon. Example: Homo sapiens 9606, Mus musculus 10090.
ADD COMMENTlink written 4.4 years ago by EagleEye6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1288 users visited in the last hour