Hello.
I just create a fungus genome assembly based on Illumina Short Read (2x150) data by following these steps
- Quality Check and Trimming -> De novo assembly (sapdes) -> Assembly Quality (Quast & BUSCO)
- Reads Mapping to contigs.fasta (to check alignment quality which resulted in 99.98% alignment)
Based on litrature, i found next steps to be
- RepeatMasker / RepeatModeler
- Annotation of assembly (I am susing FunAnnotate)
I am running funannotate in following manner following the Documentation.
#checking if my species is available
funannotate species | grep -i fusarium
#installing the databases
funannotate setup --install all --busco_db fungi --database ~/miniconda3/envs/funannotate/database/ --update --force --wget
# cleaning contigs
funannotate clean --input contigs.fasta --out contigs_clean.fasta --pident 95 --cov 95 --minlen 500 --exhaustive
#sorting contigs
funannotate sort --input contigs_clean.fasta --out contigs_clean_sorted.fasta --base contig
#Masking
funannotate mask --input contigs_clean_sorted.fasta --out contigs_clean_sorted_masked.fasta --cpus 14 --method repeatmasker --repeatmasker_species fusarium
#Prediction
funannotate predict --input contigs_clean_sorted_masked.fasta --out ./assembly_annotation/ --species "Fusarium oxysporum" --strain fungi123 --busco_seed_species fusarium
My Questions are
1: Whats the difference between RepearMasker and RepeatModeler ? is it necessary to run both on assembly (repeatmasker them repeatmodeler)
2: Thesre is a masking step in Funannotate documentation, So what is the ideal way. Run Repeatmasker and Repeatmodelers as a saperate step or just run the funannotate mask
3: when i run "funannotate mask" and select the --method tantan it works but when --method repeatmasker or repeatmodeler it stops with error
FileNotFoundError: [Errno 2] No such file or directory: ./assembly_annotation/contigs_clean_sorted_masked.fasta
how to pewrform this masking step and how to identify the --repeatmasker_species as i am unable to find the list of species in repeatmasker.