Good or bad idea to annotate with newer genome build?
1
0
Entering edit mode
6 weeks ago
Pratik Mehta ▴ 590

Hello Biostars Community,

If the probe sequences on a mouse methylation beadchip array is based on the (mm10) GRCm38 genome build, would it be a bad idea to annotate the probes with the latest genome build (mm39) GRCm39?

I am thinking yes it would be a bad idea, I just need some confirmation from peers, or maybe other helpful insight.

Thank you in advance!

array methylation beadchip build annotation • 299 views
ADD COMMENT
1
Entering edit mode

What do you hope to gain from using the latest genome build? Usually, the biggest advantage for using new genome builds is to get a better representation of the actual sequences and particularly difficult spots may be better represented in newer builds. That can be beneficial if you're dealing with genome-wide data and may help reduce alignment artifacts. I don't really see how that would come into play with an array though that's presumably based on high-confidence loci to begin with.

But maybe I'm also misunderstanding what you're actually trying to do on a technical level, i.e. are you mostly referring to gene annotation? [But again, I would make the decision based on what you hope to gain from it]

ADD REPLY
1
Entering edit mode

Thank you for responding Friederike

So basically I just wanted to have more accurate/precise and higher quality gene and promoter annotations. I have come to the conclusion (almost) that I should annotate with the newer genome build. Long story short, simply a gene of interest had a promoter annotations in mm39 that was better than mm10. Generally it made more sense. The promoters were annotated near every TSS (there are probably exceptions to this). Now I just have to see how the rest of gene/promoter annotations turn out. Overall, in my case, it's probably the best idea to re annotate everything over in mm39. I am also convinced because the sesame package author recommends the same: https://github.com/zwdzwd/sesame/issues/47#issuecomment-915715414

Thank you again : )

ADD REPLY
2
Entering edit mode
6 weeks ago
benformatics ★ 2.6k

It's always a good idea whenever possible right... but if its a huge barrier to downstream analysis then probably not necessary. People still publish data aligned to dm3 (2006) and hg19 (2009) regularly... (e.g. https://pubmed.ncbi.nlm.nih.gov/34004147/).

The only time is would probably be a bad idea is if you are focused investigating regions that were improved in recent genome build (e.g. telomeres, centromeres, repeat-regions, etc...). The main thing is that probes targeting sites in mm10 that were split or don't exist in mm39 will change.

ADD COMMENT
0
Entering edit mode

Thank you. After input from you, Friederike , and the sesame package author. Definitely feel more confident in making the change over.

ADD REPLY

Login before adding your answer.

Traffic: 2502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6