identify the type of repeat genomic region of a BED file
1
1
Entering edit mode
6.5 years ago

Hi, I have a BED file containing different human genomic regions. I would like to know if these genomic regions belong to repeat regions, and what type of repeat regions they belong to, such as Microsatellite, Minisatellite, LINES, SINES, etc. Is there a tool that can do the task? Thanks!

genome sequence • 3.0k views
ADD COMMENT
1
Entering edit mode
6.5 years ago

Get RepeatMasker-masked regions, convert them to BED with BEDOPS convert2bed — or grab UCSC data and permute it into BED — sort your BED file with BEDOPS sort-bed, and then map the IDs of the RepeatMasker regions to your regions with BEDOPS bedmap.

For example, for reference genome hg38:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/rmsk.txt.gz \ 
  | gunzip -c - \
  | awk -v OFS="\t" '{ print $6, $7, $8, $12, $11, $10 }' - \
  | sort-bed - \
  > rmsk.bed

$ sort-bed my-regions.unsorted.bed > my-regions.bed

$ bedmap --echo --echo-map-id-uniq my-regions.bed rmsk.bed > answer.bed
ADD COMMENT
0
Entering edit mode

Thank you. I'll give this a shot.

ADD REPLY

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6