Question: How to Deal With Multiple ZFIN_IDs
0
gravatar for sallehl
14 months ago by
sallehl0
sallehl0 wrote:

Hi,

I am making a heatmap to check the expression of some marker genes of different cell types in zebrafish.

I have used biomaRt to convert the ensembl IDs to ZFIN IDs to more easily interpret the heatmap, however, in the object that biomaRt returned, there are many duplicates in the ensembl IDs i.e. some ensembl IDs are mapping to more than one ZFIN ID.

I was wondering how exactly to interpret this (is it maybe due to some of the ensembl genes potentially having multiple transcripts with the same ensemble gene id but different ZFIN Ids?) and how I should go about choosing between ZFIN IDs in the case of multiple?

I have started from an excel spreadsheet containing the counts so maybe it has something to do with the upstream pipeline (aligning, counting etc)? I am planning to eventually rerun from raw data onwards so if it's something I can fix by upstream that would also be helpful.

This doesn't actually affect my current heatmap as none of the genes I am looking at are involved, but I am just wondering what best practice is/what's causing this for future scenarios.

Thanks in advance,

Liam

zfin convert id biomart ensembl • 364 views
ADD COMMENTlink modified 14 months ago by Emily_Ensembl18k • written 14 months ago by sallehl0
1
gravatar for Emily_Ensembl
14 months ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

Did you do this in the last couple of weeks? We recently updated to the GRCz11 genome, which contains many haplotypes, which are alternative versions of genomic regions. These will contain duplicates of the genes on the primary assembly, and each will have its own Ensembl gene ID, but since they are the same functional gene, will have the same zFIN name. This video contains more information about haplotypes (it was made when we only had them for human, but everything applies to zebrafish).

What you can do in BioMart is add an additional filter. Filter by the chromosome, then select all the chromosomes which have proper chromosome names (eg 1, 2, 3), don't select any with the haplotype types (eg CHR_ALT_CTG1_1_1). This will only get you the genes on the primary assembly.

ADD COMMENTlink written 14 months ago by Emily_Ensembl18k
1

I've just realised you were asking about the opposite problem to the one I've answered here. Can you send us your biomaRt query please? BioMart is centred on the Ensembl gene objects, so should not be giving more than one zFIN ID.

ADD REPLYlink written 14 months ago by Emily_Ensembl18k

Hi Emily,

Thanks for the reply, sorry, I meant to add my R script into my original question but then completely forgot. The code I used is as follows:

zfish = useMart("ensembl", dataset = "drerio_gene_ensembl")

symbols<-getBM(attributes = c("ensembl_gene_id","zfin_id_symbol"),
               filters = "ensembl_gene_id",
               values = as.vector(row.names(rawcounts)),
               mart = zfish)

I saw similar code posted on a forum by someone else, so I'm not sure its exactly how I'm meant to do it?

Liam

ADD REPLYlink modified 13 months ago by genomax68k • written 13 months ago by sallehl0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour