I have .out files RepeatMasker gave me after I ran it. It looks like this:
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
5992 15.1 3.1 1.7 2L 42739057 42741982 (4540) + rnd-1_family-153 LTR/Pao 1 1206 (1) 22037
4135 13.5 2.5 1.1 2L 42742116 42743472 (3050) + rnd-1_family-122 LTR/Pao 1 729 (157) 22038
1796 0.0 3.2 0.0 2L 42743310 42743526 (2996) C rnd-1_family-334 Unknown (38) 297 74 22039 *
It has different delimiters between columns, some lines have extra "*" symbol (so, differ in length). I want to extract position begin
, position end
and repeat class/family
columns to visualize it. Any suggestions?
In addition, RepeatMasker can produce GFF files Edit: and .xn, these are both easier to parse and to convert to BED by a gff2bed script. In particular the .xn files are simple tab-separated files which contain the family.
That's amazing! Thank you