How to combine the outputs of repeat annotation softwares
Entering edit mode
3.0 years ago
boymin2020 ▴ 80


I have been doing Repetitive sequence annotation on a new assembly genome. There is usually about four software for this purpose (e.g., TRF + RepeatProteinMasker + RepeatModeler&RepeatMasker ). Their applications confuse me a lot. Are they run independently? and then we combine the results generated by different software? Or they are run one by one, the output masked fasta file from the former software is input file of the later software?


RepeatProteinMasker TRF RepeatMasker RepeatModeler • 858 views
Entering edit mode
15 months ago
zijian • 0

Hi, I have the same problem,The next thing I do is Convert their respective results to gff format using this scriptenter link description here

Merge two gene annotation files (merge sequentially if multiple) 1.Output the lines in A.gff that meet the requirements, that is, the lines in A.gff that overlap with those in B.gff. bedtools intersect -a A.gff -b B.gff -wa >A.dup.gff 2.Output the lines that only exist in A.gff, that is, delete A.dup.gff from A.gff. Both files need to be sorted. If they are not sorted, sort them first and then process them. comm -1 A.gff A.dup.gff >A.filter.gff 3.Merge the parts of B.gff and A.gff that do not overlap with B.gff. cat B.gff A.filter.gff >sample.gff You can also use this script to count TE informationenter link description here


Login before adding your answer.

Traffic: 2138 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6