Question: How to Deal With Multiple ZFIN_IDs
gravatar for sallehl
2.4 years ago by
sallehl0 wrote:


I am making a heatmap to check the expression of some marker genes of different cell types in zebrafish.

I have used biomaRt to convert the ensembl IDs to ZFIN IDs to more easily interpret the heatmap, however, in the object that biomaRt returned, there are many duplicates in the ensembl IDs i.e. some ensembl IDs are mapping to more than one ZFIN ID.

I was wondering how exactly to interpret this (is it maybe due to some of the ensembl genes potentially having multiple transcripts with the same ensemble gene id but different ZFIN Ids?) and how I should go about choosing between ZFIN IDs in the case of multiple?

I have started from an excel spreadsheet containing the counts so maybe it has something to do with the upstream pipeline (aligning, counting etc)? I am planning to eventually rerun from raw data onwards so if it's something I can fix by upstream that would also be helpful.

This doesn't actually affect my current heatmap as none of the genes I am looking at are involved, but I am just wondering what best practice is/what's causing this for future scenarios.

Thanks in advance,


zfin convert id biomart ensembl • 785 views
ADD COMMENTlink modified 2.4 years ago by Emily_Ensembl21k • written 2.4 years ago by sallehl0
gravatar for Emily_Ensembl
2.4 years ago by
Emily_Ensembl21k wrote:

Did you do this in the last couple of weeks? We recently updated to the GRCz11 genome, which contains many haplotypes, which are alternative versions of genomic regions. These will contain duplicates of the genes on the primary assembly, and each will have its own Ensembl gene ID, but since they are the same functional gene, will have the same zFIN name. This video contains more information about haplotypes (it was made when we only had them for human, but everything applies to zebrafish).

What you can do in BioMart is add an additional filter. Filter by the chromosome, then select all the chromosomes which have proper chromosome names (eg 1, 2, 3), don't select any with the haplotype types (eg CHR_ALT_CTG1_1_1). This will only get you the genes on the primary assembly.

ADD COMMENTlink written 2.4 years ago by Emily_Ensembl21k

I've just realised you were asking about the opposite problem to the one I've answered here. Can you send us your biomaRt query please? BioMart is centred on the Ensembl gene objects, so should not be giving more than one zFIN ID.

ADD REPLYlink written 2.4 years ago by Emily_Ensembl21k

Hi Emily,

Thanks for the reply, sorry, I meant to add my R script into my original question but then completely forgot. The code I used is as follows:

zfish = useMart("ensembl", dataset = "drerio_gene_ensembl")

symbols<-getBM(attributes = c("ensembl_gene_id","zfin_id_symbol"),
               filters = "ensembl_gene_id",
               values = as.vector(row.names(rawcounts)),
               mart = zfish)

I saw similar code posted on a forum by someone else, so I'm not sure its exactly how I'm meant to do it?


ADD REPLYlink modified 2.4 years ago by genomax90k • written 2.4 years ago by sallehl0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour