Question: identify the type of repeat genomic region of a BED file
0
gravatar for jing.mengrabbit
11 months ago by
jing.mengrabbit10 wrote:

Hi, I have a BED file containing different human genomic regions. I would like to know if these genomic regions belong to repeat regions, and what type of repeat regions they belong to, such as Microsatellite, Minisatellite, LINES, SINES, etc. Is there a tool that can do the task? Thanks!

sequence genome • 517 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds25k • written 11 months ago by jing.mengrabbit10
0
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

Get RepeatMasker-masked regions, convert them to BED with BEDOPS convert2bed — or grab UCSC data and permute it into BED — sort your BED file with BEDOPS sort-bed, and then map the IDs of the RepeatMasker regions to your regions with BEDOPS bedmap.

For example, for reference genome hg38:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/rmsk.txt.gz \ 
  | gunzip -c - \
  | awk -v OFS="\t" '{ print $6, $7, $8, $12, $11, $10 }' - \
  | sort-bed - \
  > rmsk.bed

$ sort-bed my-regions.unsorted.bed > my-regions.bed

$ bedmap --echo --echo-map-id-uniq my-regions.bed rmsk.bed > answer.bed
ADD COMMENTlink modified 11 months ago • written 11 months ago by Alex Reynolds25k

Thank you. I'll give this a shot.

ADD REPLYlink written 11 months ago by jing.mengrabbit10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1632 users visited in the last hour