My question is that I read most of the mouse sequence paper, there is no one use mm10 as reference for alignment, and seems like mm10 has fewer annotation compared with mm9, is that true? Has anyone use mm10 for alignment and get a good result?
Thats becaue most people have started accumulating data, insight etc... with mm9 and are reluctant to map, annotate etc.. everything again. Then there is the data produced by other you would like to compare with, e.g. ENCODE and thats also on mm9.
What are the differences, what do you gain? http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/data/ no tables here in markdown?
mm9: Total Bases in Assembly 2,745,142,291 Total Non-N Bases in Assembly 2,648,522,751
mm10: Total Bases in Assembly 2,798,785,524 Total Non-N Bases in Assembly 2,719,482,043
so 70million more bases in assembly, that 1/2 a chromosome. There are much less unplaced scaffold, for most chromosomes they think they are completeley assembled. Then there is the PAR region for X and Y chromosomes.
Then there is the annotation. Ensembl and I think UCSC (? maybe yes) does not backport their new annotation, so this is something you loose - novel miRNAs, lincRNAs etc...
mm10 is better for alignment based (more reads are mapped) on personal experience. Also, it has been a while since mm10 is out so its the right time to make the transition. All the gene models including UCSC, RefSeq and Ensembl are available for mm10. And for other annotations you can always liftOver mm9 to mm10.