Question: Meaning of the 5th column in repeatmasker BED format results
0
gravatar for Vitis
4 months ago by
Vitis2.1k
New York
Vitis2.1k wrote:

I'd like to ask about the meaning of the 5th column (an integer number) in BED format results from repeatmasker.

9   100663131   100663387   LTR22_SS    475 +
9   70254161    70254685    ALTR2B_SSc  3460    +
9   96468811    96469391    LTR8_SSc    3756    -
9   116391614   116392469   LTR78   1152    +
9   4980341 4980930 LTR39_SSc   3300    -
9   16908116    16908359    MamGypLTR3  512 +
9   17432426    17432886    ALTR2B2_SSc 1914    -
9   18742941    18743430    LTR5_SS 2771    +
9   27131556    27132076    ERV3-1_SSc-LTR  969 -
9   30539515    30539909    LTR39B3_SSc 787 -

Searched around but did not find a definite answer, so seeking help here from repeatmasker experts. It is clearly not length of the repeat. Is it something indicating how repetitive this feature is in the genome? Or some identity scores indicating its match to some sort of repeat consensus?

genome • 227 views
ADD COMMENTlink modified 4 months ago by Alex Reynolds28k • written 4 months ago by Vitis2.1k

Column 5 is an optional field for BED files. The description of field 5 is below, from UCSC

5. Score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). This table shows the Genome Browser's translation of BED score values into shades of gray:

You may have already known that... Not sure how they determine the scores here, sorry.

ADD REPLYlink written 4 months ago by goodez460
1
gravatar for Alex Reynolds
4 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

It is the Smith-Waterman alignment score of matches with coding sequence.

I think (but am not 100% certain) a higher score means greater similarity with coding sequence, indicating a less repetitive region. It is used for cutoff filters, which are specific to different classes of repeats.

I don't know how useful it is to use these scores directly. Also see "How to read the results" from the Repeatmasker documentation:

Smith-Waterman score of the match, usually complexity adjusted The SW scores are not always directly comparable. Sometimes the complexity adjustment has been turned off, and a variety of scoring-matrices are used.

It may be worthwhile to contact the developers directly.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2123 users visited in the last hour