Dear reader,
I have some fungal genomes sequenced wil illumina short-read. a handful of them are also sequenced with nanopore. All samples belong to one Specie COmplex.
I have generated
- nanopore assemblies with
flyeand after polishing steps and filtered out small contigs. - Illumina samples were assambled with
SPAdes v4.0then QC and all assembly stats checked.
NOw i want to perform Repeat annotation before going for the genomic annotation.
As i remember, Repeat Modeler is used to generate a de-novo database from the query genome and then Repeat Masker is used to ascually mask the fasta file. Correct me if i am wrong.
My question is
Should i merge all my Nanopore based assemblies in
ONE-BIG.fastafile and use that for de-novo repeat annotation database generation with Repead-Modeler ? Then Individually mask each of the nanopore.fasta assemblies ? And for SPAdes assemblies, do the same (merge all fasta -> Annotate -> Mask each individual fasta)Second way that comes to my mind is to MERGE all Nanopore.fasta and Spades.fasta genomes into a
ONE-REALLY-BIG.fastaand then useRepeat-Modelerto generate De-novo annotation database, then mask repeats in all fasta genomes individually using this database.
Will merhing of these different sappemblies create any biasness or issue with my genome assemblies? Technically or Biologically ? AGain, all samples belong to onse Specie-Complex.
KIndly share your views about this. THanks.