Column explanation for repeat database from UCSC
1
1
Entering edit mode
4.8 years ago
Louis Kok ▴ 30

I was looking into the repeat database from UCSC as below:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz

This file does not contain a header. Can I check what those columns are so that I can better understanding the meaning? Thanks in advance.

A few lines of the file is as below:

585 1504    13  4   13  chr1    10000   10468   -249240153  +   (CCCTAA)n   Simple_repeat   Simple_repeat   1   463 0   1
585 3612    114 270 13  chr1    10468   11447   -249239174  -   TAR1    Satellite   telo    -399    1712    483 2
585 437 235 186 35  chr1    11503   11675   -249238946  -   L1MC    LINE    L1  -2236   5646    5449    3
585 239 294 19  10  chr1    11677   11780   -249238841  -   MER5B   DNA hAT-Charlie -74 104 1   4
585 318 230 38  0   chr1    15264   15355   -249235266  -   MIR3    SINE    MIR -119    143 49  5
repeats UCSC RMSK • 2.1k views
ADD COMMENT
0
Entering edit mode

Can you please edit your post and paste a few lines from the file?

ADD REPLY
0
Entering edit mode

Hi @RamRS. Added a few lines.

ADD REPLY
3
Entering edit mode
4.8 years ago

Hello,

if you go to UCSC Table Browser you can select Group -> Repeats. Afterwards there is a button describe table schema in the table line. This is what is described there:

field   example SQL type    description
bin 585 smallint(5) unsigned    Indexing field to speed chromosome range queries.
swScore 463 int(10) unsigned    Smith Waterman alignment score
milliDiv    13  int(10) unsigned    Base mismatches in parts per thousand
milliDel    6   int(10) unsigned    Bases deleted in parts per thousand
milliIns    17  int(10) unsigned    Bases inserted in parts per thousand
genoName    chr1    varchar(255)    Genomic sequence name
genoStart   10000   int(10) unsigned    Start in genomic sequence
genoEnd 10468   int(10) unsigned    End in genomic sequence
genoLeft    -248945954  int(11) -#bases after match in genomic sequence
strand  +   char(1) Relative orientation + or -
repName (TAACCC)n   varchar(255)    Name of repeat
repClass    Simple_repeat   varchar(255)    Class of repeat
repFamily   Simple_repeat   varchar(255)    Family of repeat
repStart    1   int(11) Start (if strand is +) or -#bases after match (if strand is -) in repeat sequence
repEnd  471 int(11) End in repeat sequence
repLeft 0   int(11) -#bases after match (if strand is +) or start (if strand is -) in repeat sequence
id  1   char(1) First digit of id field in RepeatMasker .out file. Best ignored.
ADD COMMENT

Login before adding your answer.

Traffic: 2246 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6