dear friends
I have set of gene symbols. when I convert such symbols to appropriate ensembl gene ids, it gave me different gene ids for a given gene symbol instead of one gene id for a given gene symbol. why is this happen?
dear friends
I have set of gene symbols. when I convert such symbols to appropriate ensembl gene ids, it gave me different gene ids for a given gene symbol instead of one gene id for a given gene symbol. why is this happen?
The gene you're looking at, AGPAT1, is found on a haplotypic region. Haplotypes are regions of the genome which have two or more versions, which we find in full in different individuals. These may have the same genes in a different order, or even different genes. We have a help video explaining this here.
AGPAT1 is found in the haplotypic MHC region, of which there are nine possible versions of the genome, and it is found in seven of those nine. You can see all the possible Ensembl IDs for the different versions of AGPAT1 here.
This could be a problem if there are multiple "gene_id" for same "gene_name" with the quantification of RNA-Seq data using htseq/featureCounts as the reads will fall under ambiguous category i.e they overlap multiple genes.
Nice video - so approximately how may proteins are multiplexed in this way ?
In the current database, 661. Some will only have two members, others like AGPAT1 have lots. One haplotype set has 36 different versions on chromosome 19.
At the moment the current human genome, GRCh38, only has haplotypes, but GRC has already started making patches to repair misassembled or gapped genomic regions. We will bring these in and annotate them so we'll be looking at more duplicate genes, however in the case of patches, the gene on the patch is good and the gene on the primary is dodgy. This is different to haplotypes where all genes are equally valid.
thank you for the explanation. It means certain gene fall in to haplotypic region have different version of same genes and each different versions are named by different Ensembl gene IDs am I correct?
Dear Emily,
I need one more explanation.I extracted first intron of a gene which fall in to haplotype region. suppose it produces seven haplotypes hence I got 7 first intron sequences. considering the sequence length, 4 out of 7 had same length.but rest of the sequences in different lengths. can I consider latter sequences in such haplotype region as different gene?
@Emily can you pls suggest if the two or more ensembl ids (ultimately also their sequences ) can be used interchangably
Thanks in advance for the reply
It depends how different they are and what you need to do with them. I'd put the cDNA sequences into CLUSTAL and see if they are interchangable or not.
if we have "genes with the same hogu ids but different ensemble id" does it make sense to add up the raw count of those? ( for RNA expression or single cell analysis). Does it make sense to treat them as isoforms?
You may be receiving IDs from other species, like NCBI's BRCA1 example. Impossible to tell without more information.
thank you for the reply.
I used HGNC gene sybmols. For example I converted this gene symbol AGPAT1 , to Ensembl Gene ID using online BioMart tool. As a result it gave me seven different Ensembl Gene IDs as follows.
HGNC symbol | Ensembl Gene ID |
AGPAT1 | ENSG00000228892 |
AGPAT1 | ENSG00000235758 |
AGPAT1 | ENSG00000227642 |
AGPAT1 | ENSG00000204310 |
AGPAT1 | ENSG00000236873 |
AGPAT1 | ENSG00000226467 |
AGPAT1 | ENSG00000206324 |
It is more possible. Example, Y_RNA gene name has different ENSG's and also each located in different chromosomes (chr1,3,4,12,14,20,X).
That's the reason whenever someone starts the analysis take one transcript/gene annotaion into account example, Gencode or Ensembl. Also consider ENGSs are reference ids till the end of your analysis (to avoid redundent ids, example gene name/symbols).
Can you give us an example please?
thank you for the reply. here is an example.
for example, I used "AGPAT1" gene symbol. I converted this gene symbol to ensemble gene ID using online BIoMart tool. It gave me seven different Ensembl Gene IDs as follows.
yes friend I did it careful selection of taxon. the problem was I need to extract some intron sequence from set of genes. once I convert such gene symvols to Ensembl Gene IDs some genes end up with giving different Ensemble Gene IDs for some given gene symbols.