For context, this is part of an RNAseq analysis, I have a list of genes, which was generated by annotateMyIDs using the Entrez id from the output of the feature count tool in galaxy, within these there were some that only had the Entrez id and in the other columns as in ensemble id, gene name, symbol appeared as NA, when I went to look for the Entrez id to NCBI these appeared as obsolete and that they had been updated to another Entrez id that was associated with a gene name. My question is, can I change the Entrez id that appeared in my list for the updated one that appeared in NCBI or do I have to verify that they are the same in some other way?
Can you provide example gene ID's you are referring to? If you don't refer to the correct database when querying NCBI databases then you may get completely different information.
This is the output of the annotateMyIDs, the first one (numeric) is the Entrez id, then follows the ensemble id, then the symbol, and finally the gene name:
7091 / ENSG00000106829 / TLE4 / TLE family member 4, transcriptional corepressor
This is the one that I have problems with:
100507106 / NA / NA / NA
I only have the Entrez id, and used that to search in NCBI
Using Entrezdirect:
thanks, but I explained the previous answer wrong, the first one was an example that came out well from the annotateMyIDs output and the second one was an example that came out incomplete. Regarding the Entrez id that appear discontinuous like this (100507106), I could discard them from my data even if they have statistics in my samples such as fold change, p-value and p-adj?. On the other hand I have another query, for example this Entrez id (100506448) that in the list also appears as NA (previous example) in NCBI it appears that it was replaced with another Entrez id (25788) and this is associated with a symbol and name of gen, can I take that new Entrez id (25788) and replace it with the one I have on my list (100506448)?
I don't know what to say about question #1. Which annotation file/database did you use to come up with those ID's in first place. Clearly that ID above is discontinued at this time.
As for the second question - IF the gene locations that the two ID's are referring to are identical for a genome build being used you could do that. In this case all it appears that the annotation for that location was updated.
Still appears to be on
chr8
but since a precise location is not mentioned in previous entry ....thank you, I used the built-in genome option of featureCounts tool in galaxy program (usegalaxy.org), because I'm working with reference genome hg19 (GRCh37) and featureCounts has it built in (To create the files, the annotations were downloaded from NCBI RefSeq database and then adapted by merging overlapping exons from the same gene to form a set of disjoint exons for each gene. Genes with the same Entrez gene identifiers were also merged into one gene), do you know some way to get the data from where is the gene located (what chromosome and coordinates ) in my data using this tool in galaxy ?, because I'm not working with a console