Repeatmasker output match input
1
0
Entering edit mode
4.3 years ago
Kian ▴ 50

Hi my input file to UCSC for repeat elements is like:

chrom  txStart  txEnd
chr1    15079913    35209257

output:

genoName    genoStart    genoEnd    strand    repName    repClass    repFamily**
chr1    16777160    16777470    +    AluSp    SINE    Alu
chr1    25165800    25166089    -    AluY    SINE    Alu
chr1    33553606    33554646    +    L2b    LINE    L2

the raw of input file and output file is not the same. output have more raw in not the same the distance in input file!! how i can match two file?

How I have repeats elements for the default distance in input file and not more? Thanks

Ucsc repeatmasker output input match • 1.2k views
ADD COMMENT
0
Entering edit mode
4.3 years ago

Why would you believe that a ~20 megabase region would all be exactly one type of repeat? That wouldn't be biologically plausible. The output from UCSC is correct, your assumption and goal is what's wrong.

ADD COMMENT
0
Entering edit mode

Thanks for your response, i know in this distance there are not one type repeat, Actually, i want to know in the distance how many and what repeat exist. but i think UCSC divide my distance and tell me in the per section what repeat exist. its good but not my goal! in have a distance and want to know in this distance there are how many LINE, how many SINE, how many....,

ADD REPLY
1
Entering edit mode

It's giving you the individual entries, so count them. Alternatively, download the repeatmasker file, convert it to BED, use bedtools intersect and use whatever method you prefer to finish summarizing things.

ADD REPLY
0
Entering edit mode

Thanks, i will try it!

ADD REPLY

Login before adding your answer.

Traffic: 1128 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6