Ensembl version for mapping and annotating genes
1
0
Entering edit mode
11 weeks ago
bioinfo ▴ 10

Hello,

I am mapping my data to mmGRCm38 v92. However, when I use tximport and biomart to annotate my transcripts to gene IDs I think that it annotates using the most recent ensembl version for mouse. Is that a problem?

I use the following code:

mart <- biomaRt::useMart("ensembl", "mmusculus_gene_ensembl", host = "uswest.ensembl.org")


Should I be using the code below to make sure that I am using the same version?

mart <- biomaRt::useMart("ensembl", host="http://apr2018.archive.ensembl.org", "mmusculus_gene_ensembl")


Thank you

rna-seq kallisto • 325 views
0
Entering edit mode
10 weeks ago

Ensembl Transcript IDs and GeneIDs are stable. A transcriptID mapped to a geneID remains consistently the same across all versions. In newer versions newly identified transcripts/isoforms get assigned to a gene. Older TranscriptID's and GeneID's don't change. The key point to pay attention to is that anything that is co-ordinate based will change between different versions, e.g. genomic location of a feature (Exon, Intron, UTR etc) between 2 different versions can be different. If you are using transcriptIDs and GeneIDs, without anything to do with coordinates then you are good.

0
Entering edit mode

Thank you for replying. I am mapping my data using Kallisto and then using tximport and biomart to change the gene IDs. Do you think that coordinates are involved in any of the steps?

0
Entering edit mode

mm39 became the default mouse genome for Ensembl starting with v.103 (Ensembl is in version 106 as of today). I think biomaRt would retrieve coordinates for the default version which would indeed be incorrect if your data was aligned to GRCm38. Verify which database you are using. To be consistent it may be safer to use the same version of genome build in biomaRt. While the ID's may not change, their versions may have changed.