Question: how to find consensus sequence for repeats
0
gravatar for dli
4.7 years ago by
dli230
WUSTL
dli230 wrote:

Hi,

I downloaded the rmsk.txt from UCSC genome browser, http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=rep&hgta_track=rmsk&hgta_table=rmsk&hgta_doSchema=describe+table+schema, I got following stuff:

bin swScore milliDiv    milliDel    milliIns    genoName    genoStart   genoEnd genoLeft    strand  repName repClass    repFamily   repStart    repEnd  repLeft id
585 463 13  6   17  chr1    10000   10468   -248945954  +   (TAACCC)n   Simple_repeat   Simple_repeat   1   471 0   1
585 3612    114 215 13  chr1    10468   11447   -248944975  -   TAR1    Satellite   telo    -399    1712    483 2
585 484 251 132 0   chr1    11504   11675   -248944747  -   L1MC5a  LINE    L1  -2382   395 199 3
585 239 294 19  10  chr1    11677   11780   -248944642  -   MER5B   DNA hAT-Charlie -74 104 1   4
585 318 230 37  0   chr1    15264   15355   -248941067  -   MIR3    SINE    MIR -119    143 49  5
585 18  232 0   19  chr1    15797   15849   -248940573  +   (TGCTCC)n   Simple_repeat   Simple_repeat   1   52  0   6
585 18  137 0   0   chr1    16712   16744   -248939678  +   (TGG)n  Simple_repeat   Simple_repeat   1   32  0   7
585 239 338 129 0   chr1    18906   19048   -248937374  +   L2a LINE    L2  2942    3104    -322    8
585 994 312 60  25  chr1    19971   20405   -248936017  +   L3  LINE    CR1 2680    3129    -970    9
585 270 331 7   27  chr1    20530   20679   -248935743  +   Plat_L3 LINE    CR1 2802    2947    -639    1

for example, the repeat name L1MC5a, if I want to get the sequence of this repeat, should I found from RepBase? But I could not find it from Repbase: http://www.girinst.org/repbase/update/browse.php?type=All&format=EMBL&autonomous=on&nonautonomous=on&simple=on&division=Homo+sapiens&letter=L

anyone has suggestions on how to fix this? Thanks a lot in advance.

repeat genome • 1.5k views
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by dli230
1
gravatar for GenoMax
4.7 years ago by
GenoMax93k
United States
GenoMax93k wrote:

See if table browser @UCSC works.
Select human(?) genome --> Group (Repeats) --> Track (Repeatmasker) --> Region (whole genome/region?) --> Output format (Sequence) --> Give a file name to save the data to file.

ADD COMMENTlink written 4.7 years ago by GenoMax93k

Thanks for you reply @genomax2.

I am not actually looking for genomic sequence for copies, I am looking for consensus sequences.

ADD REPLYlink written 4.7 years ago by dli230

I don't think there is a consensus. The repeats by their nature will have difference. e.g. If I restrict to the output to L2a LINE repeat I get this summary from the table browser

item count  174,058
item bases  43,353,715 (1.42%)
item total  43,373,283 (1.42%)
smallest item   11
average item    249
biggest item    3,283
ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by GenoMax93k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1064 users visited in the last hour