Hard-masked or soft-masked genome
1
3
Entering edit mode
4.3 years ago
bioinf2305 ▴ 30

Which strategy of repeat masking should be preferred before gene prediction using Pasa, Augustus, Snap and Genescan? Should I carry out soft masking the genome or hardmasking the genome? I read that Augustus prefer soft masking but not sure about other gene predicting tools.

Repeatmasking Gene prediction • 6.9k views
ADD COMMENT
0
Entering edit mode

It is good to mask the genomes, softmask(repeats in lowercase rather than "N").

ADD REPLY
5
Entering edit mode
4.3 years ago

Soft masked!

Depends to some extent which gene predictor you're gonna apply, as in "can it interpret soft masked genomes"? Now, most gene predictors I know and have used do, so that's not really an issue.

The key thing is that if you softmask the genome you (or the gene prediction tool that is) still has all sequence info at it's disposal. If for instance the masking tool has some false positive maskings, those might still get recovered by the gene predictor as they might have some transcript data aligned to it and might be part of a valid gene.

If you hard masked the genome the prediction tool has not clue anymore of the actual sequence and is thus not able anymore to decide for itself how to interpret the masked region.

ADD COMMENT
0
Entering edit mode

This is also a question I had; for example, a utility/programme like Chromosomer (https://github.com/gtamazian/chromosomer) would (should) accept soft-masked reference genomes and query sequences - right? Otherwise, how will one "unmask" the assembled chromosomes?

ADD REPLY
0
Entering edit mode

I'm not aware of that specific tool but that sounds logic indeed. Alternatively it also makes sense that it should work with non-masked sequences.

ADD REPLY

Login before adding your answer.

Traffic: 2407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6