What to do if the Entrez id that I have had been updated with a new one and now is associated with a gene
0
0
Entering edit mode
5 weeks ago
kizzy ▴ 10

For context, this is part of an RNAseq analysis, I have a list of genes, which was generated by annotateMyIDs using the Entrez id from the output of the feature count tool in galaxy, within these there were some that only had the Entrez id and in the other columns as in ensemble id, gene name, symbol appeared as NA, when I went to look for the Entrez id to NCBI these appeared as obsolete and that they had been updated to another Entrez id that was associated with a gene name. My question is, can I change the Entrez id that appeared in my list for the updated one that appeared in NCBI or do I have to verify that they are the same in some other way?

featurecount entrezid galaxy genename • 328 views
0
Entering edit mode

Can you provide example gene ID's you are referring to? If you don't refer to the correct database when querying NCBI databases then you may get completely different information.

0
Entering edit mode

This is the output of the annotateMyIDs, the first one (numeric) is the Entrez id, then follows the ensemble id, then the symbol, and finally the gene name:

7091 / ENSG00000106829 / TLE4 / TLE family member 4, transcriptional corepressor

This is the one that I have problems with:

100507106 / NA / NA / NA

I only have the Entrez id, and used that to search in NCBI

0
Entering edit mode

Using Entrezdirect:

$efetch -db gene -id 7091 1. TLE4 Official Symbol: TLE4 and Name: TLE family member 4, transcriptional corepressor [Homo sapiens (human)] Other Aliases: BCE-1, BCE1, E(spI), E(spl), ESG, ESG4, GRG4, Grg-4 Other Designations: transducin-like enhancer protein 4; B lymphocyte gene 1; enhancer of split groucho 4; groucho-related protein 4; transducin like enhancer of split 4; transducin-like enhancer of split 4 (E(sp1) homolog, Drosophila); transducin-like enhancer of split 4, homolog of Drosophila E(sp1) Chromosome: 9; Location: 9q21.31 Annotation: Chromosome 9 NC_000009.12 (79571965..79726882) MIM: 605132 ID: 7091$ efetch -db gene -id 100507106

1. LOC100507106
uncharacterized LOC100507106 [Homo sapiens (human)]
Chromosome: 11
Annotation: Chromosome 11 NC_000011.9 (57405850..57420606)
This record was discontinued.
ID: 100507106

0
Entering edit mode

thanks, but I explained the previous answer wrong, the first one was an example that came out well from the annotateMyIDs output and the second one was an example that came out incomplete. Regarding the Entrez id that appear discontinuous like this (100507106), I could discard them from my data even if they have statistics in my samples such as fold change, p-value and p-adj?. On the other hand I have another query, for example this Entrez id (100506448) that in the list also appears as NA (previous example) in NCBI it appears that it was replaced with another Entrez id (25788) and this is associated with a symbol and name of gen, can I take that new Entrez id (25788) and replace it with the one I have on my list (100506448)?

0
Entering edit mode

I don't know what to say about question #1. Which annotation file/database did you use to come up with those ID's in first place. Clearly that ID above is discontinued at this time.

As for the second question - IF the gene locations that the two ID's are referring to are identical for a genome build being used you could do that. In this case all it appears that the annotation for that location was updated.

$efetch -db gene -id 100506448 1. LOC100506448 fibrinogen silencer-binding protein-like [Homo sapiens (human)] Chromosome: 8 This record was replaced with GeneID: 25788 ID: 100506448  Still appears to be on chr8 but since a precise location is not mentioned in previous entry .... $ efetch -db gene -id 25788

Other Aliases: RDH54
Other Designations: DNA repair and recombination protein RAD54B
Chromosome: 8; Location: 8q22.1
Annotation: Chromosome 8 NC_000008.11 (94371960..94475115, complement)
MIM: 604289
ID: 25788

0
Entering edit mode

thank you, I used the built-in genome option of featureCounts tool in galaxy program (usegalaxy.org), because I'm working with reference genome hg19 (GRCh37) and featureCounts has it built in (To create the files, the annotations were downloaded from NCBI RefSeq database and then adapted by merging overlapping exons from the same gene to form a set of disjoint exons for each gene. Genes with the same Entrez gene identifiers were also merged into one gene), do you know some way to get the data from where is the gene located (what chromosome and coordinates ) in my data using this tool in galaxy ?, because I'm not working with a console