Question: Meaning of the 5th column in repeatmasker BED format results
0
gravatar for Vitis
14 months ago by
Vitis2.3k
New York
Vitis2.3k wrote:

I'd like to ask about the meaning of the 5th column (an integer number) in BED format results from repeatmasker.

9   100663131   100663387   LTR22_SS    475 +
9   70254161    70254685    ALTR2B_SSc  3460    +
9   96468811    96469391    LTR8_SSc    3756    -
9   116391614   116392469   LTR78   1152    +
9   4980341 4980930 LTR39_SSc   3300    -
9   16908116    16908359    MamGypLTR3  512 +
9   17432426    17432886    ALTR2B2_SSc 1914    -
9   18742941    18743430    LTR5_SS 2771    +
9   27131556    27132076    ERV3-1_SSc-LTR  969 -
9   30539515    30539909    LTR39B3_SSc 787 -

Searched around but did not find a definite answer, so seeking help here from repeatmasker experts. It is clearly not length of the repeat. Is it something indicating how repetitive this feature is in the genome? Or some identity scores indicating its match to some sort of repeat consensus?

genome • 514 views
ADD COMMENTlink modified 14 months ago by Alex Reynolds29k • written 14 months ago by Vitis2.3k

Column 5 is an optional field for BED files. The description of field 5 is below, from UCSC

5. Score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). This table shows the Genome Browser's translation of BED score values into shades of gray:

You may have already known that... Not sure how they determine the scores here, sorry.

ADD REPLYlink written 14 months ago by goodez470
1
gravatar for Alex Reynolds
14 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

It is the Smith-Waterman alignment score of matches with coding sequence.

I think (but am not 100% certain) a higher score means greater similarity with coding sequence, indicating a less repetitive region. It is used for cutoff filters, which are specific to different classes of repeats.

I don't know how useful it is to use these scores directly. Also see "How to read the results" from the Repeatmasker documentation:

Smith-Waterman score of the match, usually complexity adjusted The SW scores are not always directly comparable. Sometimes the complexity adjustment has been turned off, and a variety of scoring-matrices are used.

It may be worthwhile to contact the developers directly.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1535 users visited in the last hour