Question: Perl Script To Summarise The Repeat Masker Out File
0
gravatar for figo
5.7 years ago by
figo200
figo200 wrote:

Hi All

I need to summarize a repeat masker.out file with different repeats types with percentage. Does any body no any tool which can do this or a perl script.

Best

• 2.7k views
ADD COMMENTlink modified 4.3 years ago by Biostar ♦♦ 20 • written 5.7 years ago by figo200

I can help you, but can't understand your question. This is an example of *out - which part you want to summarize?

   SW   perc perc perc  query     position in query    matching  repeat            position in repeat
  score   div. del. ins.  sequence  begin end   (left)   repeat    class/family    begin  end    (left)   ID
   1078   12.3  1.8  0.3  10            GENE1  509    (0) + Repeat1 RepeataA      1    516    (0)    1   
   1099   13.8  0.4  0.0  1000          GENE2   342    (0) C Repeat2 RepeatB   (61)    341      1    2

It would be better if you give an example of your *out file.

Percentage of what? Query covered with a particular repeat?

PS.: Change tag into repeatmasker

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k

thanks for your reply. Repeat masker was run by some one else and I got only the repeat masked outfile. I want to summarise the entire outfile . I don't have the repeat masked summary file.I know how much (bp) was used for the analysis. I want the different percentage of all the different types of elements in the file.

ADD REPLYlink written 5.7 years ago by figo200
3
gravatar for PoGibas
5.7 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

I understood your question like this:

You want to summarize repeats having only RepeaMasker out file. You want output to be similar to RepeatMasker tbl file.

Using this *out file is example:

 SW   perc perc perc  query     position in query    matching      repeat              position in repeat
score   div. del. ins.  sequence  begin end   (left)   repeat        class/family      begin  end    (left)   ID

225   10.0  0.0  0.0  100016        1    30   (12) + L1P3          LINE/L1               28     57 (6404)     1  
795   15.3  1.5  0.0  100071        1   131    (0) C LTR12_        LTR/ERV1            (83)    605    473     2  
402   13.1  0.0  1.6  100087        1    62    (2) + HERV3-int     LTR/ERV1            6068   6128 (2298)     3  
276   22.5  1.4  0.0  100152       50   120    (0) + L1MDa         LINE/L1               74    145 (6488)     4  
257   13.9  0.0  0.0  100163        5    40    (0) C 7SLRNA        srpRNA             (247)     73     38     5  
274   11.1  0.0  0.0  100164        5    40    (0) C 7SLRNA        srpRNA             (247)     73     38     6  
419   15.2  2.5  1.2  100197       36   114    (0) C AluSc5        SINE/Alu           (118)    191    112     7

And having "I know how much (bp) was used for the analysis" let's say - 123456bp

This is quick and ugly way to get the output similar to tbl file:

grep -v 'SW   perc perc perc\|score   div. del. ins\|^$' EXAMPLE.out |
   awk '{print $7-$6+1,$11"-"$10}' |
   awk '{group[$2]}; {count[$2]+=$1}; END {for (i in group) print i, (count[i]*100)/123456" %"}' |
   sort | 
   column -t

 LINE/L1-L1MDa       0.0567004 %
 LINE/L1-L1P3        0.0234902 %
 LTR/ERV1-HERV3-int  0.0494103 %
 LTR/ERV1-LTR12_     0.105301 %
 SINE/Alu-AluSc5     0.0631804 %
 srpRNA-7SLRNA       0.0567004 %
ADD COMMENTlink modified 4.4 years ago • written 5.7 years ago by PoGibas4.8k
1

thanks for your time and fantastic reply. Just one addition that in this part of script "awk '{print $7-$6,$11"-"$10}' " the repeat element length will be ($7-$6)+1 since the values are the repeat start and repeat end so simple subtraction will cause the decrease of length by 1. So, addition of 1 will be required to get correct length. Thanks

ADD REPLYlink written 5.7 years ago by figo200

Thanks, fixed it.

ADD REPLYlink written 5.7 years ago by PoGibas4.8k

If this solution work for you - accept the answer. Also rename question into "coverage from RepeatMasker out file"

ADD REPLYlink written 5.7 years ago by PoGibas4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 862 users visited in the last hour