Question

How to Deal With Multiple ZFIN_IDs

0

Entering edit mode

6.0 years ago

sallehl • 0

Hi,

I am making a heatmap to check the expression of some marker genes of different cell types in zebrafish.

I have used biomaRt to convert the ensembl IDs to ZFIN IDs to more easily interpret the heatmap, however, in the object that biomaRt returned, there are many duplicates in the ensembl IDs i.e. some ensembl IDs are mapping to more than one ZFIN ID.

I was wondering how exactly to interpret this (is it maybe due to some of the ensembl genes potentially having multiple transcripts with the same ensemble gene id but different ZFIN Ids?) and how I should go about choosing between ZFIN IDs in the case of multiple?

I have started from an excel spreadsheet containing the counts so maybe it has something to do with the upstream pipeline (aligning, counting etc)? I am planning to eventually rerun from raw data onwards so if it's something I can fix by upstream that would also be helpful.

This doesn't actually affect my current heatmap as none of the genes I am looking at are involved, but I am just wondering what best practice is/what's causing this for future scenarios.

Thanks in advance,

Liam

biomaRt ENSEMBL ZFIN • 1.8k views

ADD COMMENT • link updated 9 months ago by Ram 43k • written 6.0 years ago by sallehl • 0

GenoMax · Answer 1 · 2018-04-24

1

Entering edit mode

6.0 years ago

Emily 23k

Did you do this in the last couple of weeks? We recently updated to the GRCz11 genome, which contains many haplotypes, which are alternative versions of genomic regions. These will contain duplicates of the genes on the primary assembly, and each will have its own Ensembl gene ID, but since they are the same functional gene, will have the same zFIN name. This video contains more information about haplotypes (it was made when we only had them for human, but everything applies to zebrafish).

What you can do in BioMart is add an additional filter. Filter by the chromosome, then select all the chromosomes which have proper chromosome names (eg 1, 2, 3), don't select any with the haplotype types (eg CHR_ALT_CTG1_1_1). This will only get you the genes on the primary assembly.

ADD COMMENT • link 6.0 years ago by Emily 23k

1

Entering edit mode

I've just realised you were asking about the opposite problem to the one I've answered here. Can you send us your biomaRt query please? BioMart is centred on the Ensembl gene objects, so should not be giving more than one zFIN ID.

ADD REPLY • link 6.0 years ago by Emily 23k

0

Entering edit mode

Hi Emily,

Thanks for the reply, sorry, I meant to add my R script into my original question but then completely forgot. The code I used is as follows:

zfish = useMart("ensembl", dataset = "drerio_gene_ensembl")

symbols<-getBM(attributes = c("ensembl_gene_id","zfin_id_symbol"),
               filters = "ensembl_gene_id",
               values = as.vector(row.names(rawcounts)),
               mart = zfish)

I saw similar code posted on a forum by someone else, so I'm not sure its exactly how I'm meant to do it?

Liam

ADD REPLY • link updated 6.0 years ago by GenoMax 141k • written 6.0 years ago by sallehl • 0