I wanted to know whether repeatmasking a genome or genome fragment, before performing gene annotation, is a good idea. The literature (http://www.nature.com/nrg/journal/v1...l/nrg3174.html, http://onlinelibrary.wiley.com/doi/1...eva.12178/full) and software such as MAKER, and PASA advocate repeatmasking before annotation of genes. Some others say that gene annotation pipelines should be run twice. With and without repeatmasking. I understand the logic behind repeat-masking and then performing gene annotation. However, running a gene annotation on an unmasked genome especially when repeat related genes are not my concern... I don't understand. Therefore looking for answers to the following questions.
1) What are the chances that a non-repeat related gene contains a repetitive region (lets say part of the gag-pol domain is present in a gene, or some exon contains a satellite repeat)? Are there any such cases reported?
2) For genome reference guided transcriptome assembly purposes is it recommended that a masked genome be used? I agree that for expression quantification, this may lead to overestimation or under representation in some cases.
Thank you in advance.