I have been doing Repetitive sequence annotation on a new assembly genome. There is usually about four software for this purpose (e.g., TRF + RepeatProteinMasker + RepeatModeler&RepeatMasker ). Their applications confuse me a lot. Are they run independently? and then we combine the results generated by different software? Or they are run one by one, the output masked fasta file from the former software is input file of the later software?


Hi, I have the same problem,The next thing I do is Convert their respective results to gff format using this scriptenter link description here

Merge two gene annotation files (merge sequentially if multiple) 1.Output the lines in A.gff that meet the requirements, that is, the lines in A.gff that overlap with those in B.gff. bedtools intersect -a A.gff -b B.gff -wa >A.dup.gff 2.Output the lines that only exist in A.gff, that is, delete A.dup.gff from A.gff. Both files need to be sorted. If they are not sorted, sort them first and then process them. comm -1 A.gff A.dup.gff >A.filter.gff 3.Merge the parts of B.gff and A.gff that do not overlap with B.gff. cat B.gff A.filter.gff >sample.gff You can also use this script to count TE informationenter link description here


