Question: how to find consensus sequence for repeats
0
gravatar for dli
3.1 years ago by
dli220
WUSTL
dli220 wrote:

Hi,

I downloaded the rmsk.txt from UCSC genome browser, http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=rep&hgta_track=rmsk&hgta_table=rmsk&hgta_doSchema=describe+table+schema, I got following stuff:

bin swScore milliDiv    milliDel    milliIns    genoName    genoStart   genoEnd genoLeft    strand  repName repClass    repFamily   repStart    repEnd  repLeft id
585 463 13  6   17  chr1    10000   10468   -248945954  +   (TAACCC)n   Simple_repeat   Simple_repeat   1   471 0   1
585 3612    114 215 13  chr1    10468   11447   -248944975  -   TAR1    Satellite   telo    -399    1712    483 2
585 484 251 132 0   chr1    11504   11675   -248944747  -   L1MC5a  LINE    L1  -2382   395 199 3
585 239 294 19  10  chr1    11677   11780   -248944642  -   MER5B   DNA hAT-Charlie -74 104 1   4
585 318 230 37  0   chr1    15264   15355   -248941067  -   MIR3    SINE    MIR -119    143 49  5
585 18  232 0   19  chr1    15797   15849   -248940573  +   (TGCTCC)n   Simple_repeat   Simple_repeat   1   52  0   6
585 18  137 0   0   chr1    16712   16744   -248939678  +   (TGG)n  Simple_repeat   Simple_repeat   1   32  0   7
585 239 338 129 0   chr1    18906   19048   -248937374  +   L2a LINE    L2  2942    3104    -322    8
585 994 312 60  25  chr1    19971   20405   -248936017  +   L3  LINE    CR1 2680    3129    -970    9
585 270 331 7   27  chr1    20530   20679   -248935743  +   Plat_L3 LINE    CR1 2802    2947    -639    1

for example, the repeat name L1MC5a, if I want to get the sequence of this repeat, should I found from RepBase? But I could not find it from Repbase: http://www.girinst.org/repbase/update/browse.php?type=All&format=EMBL&autonomous=on&nonautonomous=on&simple=on&division=Homo+sapiens&letter=L

anyone has suggestions on how to fix this? Thanks a lot in advance.

repeat genome • 1.1k views
ADD COMMENTlink modified 3.0 years ago • written 3.1 years ago by dli220
1
gravatar for genomax
3.1 years ago by
genomax65k
United States
genomax65k wrote:

See if table browser @UCSC works.
Select human(?) genome --> Group (Repeats) --> Track (Repeatmasker) --> Region (whole genome/region?) --> Output format (Sequence) --> Give a file name to save the data to file.

ADD COMMENTlink written 3.1 years ago by genomax65k

Thanks for you reply @genomax2.

I am not actually looking for genomic sequence for copies, I am looking for consensus sequences.

ADD REPLYlink written 3.0 years ago by dli220

I don't think there is a consensus. The repeats by their nature will have difference. e.g. If I restrict to the output to L2a LINE repeat I get this summary from the table browser

item count  174,058
item bases  43,353,715 (1.42%)
item total  43,373,283 (1.42%)
smallest item   11
average item    249
biggest item    3,283
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1525 users visited in the last hour