Hi All
I need to summarize a repeat masker.out file with different repeats types with percentage. Does any body no any tool which can do this or a perl script.
Best
Hi All
I need to summarize a repeat masker.out file with different repeats types with percentage. Does any body no any tool which can do this or a perl script.
Best
I understood your question like this:
You want to summarize repeats having only RepeaMasker
out
file. You want output to be similar to RepeatMasker
tbl
file.
Using this *out
file is example:
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
225 10.0 0.0 0.0 100016 1 30 (12) + L1P3 LINE/L1 28 57 (6404) 1
795 15.3 1.5 0.0 100071 1 131 (0) C LTR12_ LTR/ERV1 (83) 605 473 2
402 13.1 0.0 1.6 100087 1 62 (2) + HERV3-int LTR/ERV1 6068 6128 (2298) 3
276 22.5 1.4 0.0 100152 50 120 (0) + L1MDa LINE/L1 74 145 (6488) 4
257 13.9 0.0 0.0 100163 5 40 (0) C 7SLRNA srpRNA (247) 73 38 5
274 11.1 0.0 0.0 100164 5 40 (0) C 7SLRNA srpRNA (247) 73 38 6
419 15.2 2.5 1.2 100197 36 114 (0) C AluSc5 SINE/Alu (118) 191 112 7
And having "I know how much (bp) was used for the analysis" let's say - 123456bp
This is quick and ugly way to get the output similar to tbl
file:
grep -v 'SW perc perc perc\|score div. del. ins\|^$' EXAMPLE.out |
awk '{print $7-$6+1,$11"-"$10}' |
awk '{group[$2]}; {count[$2]+=$1}; END {for (i in group) print i, (count[i]*100)/123456" %"}' |
sort |
column -t
LINE/L1-L1MDa 0.0567004 %
LINE/L1-L1P3 0.0234902 %
LTR/ERV1-HERV3-int 0.0494103 %
LTR/ERV1-LTR12_ 0.105301 %
SINE/Alu-AluSc5 0.0631804 %
srpRNA-7SLRNA 0.0567004 %
thanks for your time and fantastic reply. Just one addition that in this part of script "awk '{print $7-$6,$11"-"$10}' " the repeat element length will be ($7-$6)+1 since the values are the repeat start and repeat end so simple subtraction will cause the decrease of length by 1. So, addition of 1 will be required to get correct length. Thanks
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I can help you, but can't understand your question. This is an example of
*out
- which part you want to summarize?It would be better if you give an example of your
*out
file.Percentage of what? Query covered with a particular repeat?
PS.: Change tag into repeatmasker
thanks for your reply. Repeat masker was run by some one else and I got only the repeat masked outfile. I want to summarise the entire outfile . I don't have the repeat masked summary file.I know how much (bp) was used for the analysis. I want the different percentage of all the different types of elements in the file.