Repeat masking for genome annotation
Entering edit mode
5 weeks ago
liorglic ▴ 410

I am working on a (plant) genome annotation pipeline and would like some advice regarding repeat masking. My pipeline consists of running several ab-initio gene prediction tools (Augustus, GlimmerHMM and SNAP) + transcript alignment (PASA) + protein alignment (genomeThreader) evidence + gene liftover (liftoff), and finally generating gene models using EvidenceModeler.
I am wondering about the best way to go about repeat masking within this pipeline. Specifically, my questions are:

  1. When should I do it - should the masking be done right at the beginning, before running any ab-initio or alignment tool? Alternatively, maybe I should generate gene models on the un-masked genome and only intersect gene models with repeat annotations at the end and filter using a more sophisticated method?
  2. Should I apply hard or soft masking?
  3. What software should I use? I see for instance that EDTA can be used for TE detection, but should I also use a tool like RepeatMasker for other types of repetitive elements, or is this redundant in some way?

I should mention that my main focus is protein coding genes, and I'm not so interested in TE annotation and classification at this point.
Any suggestion or advice is welcome. Thank you!

masking annotation repeat • 100 views

Login before adding your answer.

Traffic: 1525 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6