Question: convert bed file to GRangeList object
2
gravatar for mariamari693693
2.5 years ago by
Finland
mariamari69369320 wrote:

I have a bed file as follow:

chr1    100194350       100194710       ARID3A  1000 
chr1    151430604       151430964       ARID3A  1000  
chr1    20301327        20301687        ARID3A  1000 
chr1    229267393       229267573       ARID3A  1000  
chr1    8802108         8802375         ARID3A  1000
chr1    109289093       109289349       ATF1    1000  
chr1    110527180       110527436       ATF1    1000
chr1    110950342       110950486       ATF1    1000  
chr1    115124275       115124409       ATF1    1000  
chr1    115259380       115259491       ATF1    1000  
...

and I would like to convert it to a GRangeList object in R as follow:

load("data.rda")
head(data)
$ARID3A
GRanges object with 8999 ranges and 0 metadata columns:
   seqnames                 ranges strand
      <Rle>              <IRanges>  <Rle>
 1     chr1     [1307917, 1308277]      *
 2     chr1     [1407080, 1407440]      *
 3     chr1     [1858670, 1859030]      *
 4     chr1     [2175900, 2176260]      *
 5     chr1     [2290655, 2291015]      *
   ...      ...                    ...    ...

  8966     chrX [154495495, 154495855]      *
  8967     chrX [154799333, 154799693]      *
  8968     chrX [154819952, 154820312]      *
  8969     chrX [154840885, 154841245]      *
  8970     chrX [155434904, 155435264]      *

  -------

  seqinfo: 23 sequences from an unspecified genome; no seqlengths

$ATF1

GRanges object with 14883 ranges and 0 metadata columns:
seqnames                 ranges strand


 <Rle>              <IRanges>  <Rle>
  1     chr1     [ 778593,  778805]      *
  2     chr1     [1000794, 1001007]      *
  3     chr1     [1032962, 1033218]      *
  4     chr1     [1109781, 1110037]      *
  5     chr1     [1185572, 1185828]      *
...      ...                    ...    ...

  14846     chrX [155026863, 155027119]      *
  14847     chrX [155057436, 155057692]      *
  14848     chrX [155881105, 155881361]      *
  14849     chrX [155881673, 155881929]      *
  14850     chrX [155893620, 155893876]      *
  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths

  ...

I checked GenomicRanges but I could not find a way to make it from bed file?

Thank you so much for helping me.

R • 4.3k views
ADD COMMENTlink modified 4 months ago by bernatgel1.9k • written 2.5 years ago by mariamari69369320
6
gravatar for zx8754
2.5 years ago by
zx87547.5k
London
zx87547.5k wrote:

We need to split then use lapply to get list output of ranges

# dummy data
df1 <- read.table(text = "chr1    100194350       100194710       ARID3A  1000 
chr1    151430604       151430964       ARID3A  1000  
                  chr1    20301327        20301687        ARID3A  1000 
                  chr1    229267393       229267573       ARID3A  1000  
                  chr1    8802108         8802375         ARID3A  1000
                  chr1    109289093       109289349       ATF1    1000  
                  chr1    110527180       110527436       ATF1    1000
                  chr1    110950342       110950486       ATF1    1000  
                  chr1    115124275       115124409       ATF1    1000  
                  chr1    115259380       115259491       ATF1    1000  ", header = FALSE)


library(GenomicRanges)

# split and convert per region
res <- 
  lapply(split(df1, df1$V4), function(i){
    GRanges(seqnames = i$V1,
            ranges = IRanges(start = i$V2,
                             end = i$V3,
                             names = i$V4))
  })

# result
res

$ARID3A
GRanges object with 5 ranges and 0 metadata columns:
         seqnames                 ranges strand
            <Rle>              <IRanges>  <Rle>
  ARID3A     chr1 [100194350, 100194710]      *
  ARID3A     chr1 [151430604, 151430964]      *
  ARID3A     chr1 [ 20301327,  20301687]      *
  ARID3A     chr1 [229267393, 229267573]      *
  ARID3A     chr1 [  8802108,   8802375]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$ATF1
GRanges object with 5 ranges and 0 metadata columns:
       seqnames                 ranges strand
          <Rle>              <IRanges>  <Rle>
  ATF1     chr1 [109289093, 109289349]      *
  ATF1     chr1 [110527180, 110527436]      *
  ATF1     chr1 [110950342, 110950486]      *
  ATF1     chr1 [115124275, 115124409]      *
  ATF1     chr1 [115259380, 115259491]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENTlink written 2.5 years ago by zx87547.5k
6
gravatar for igor
2.5 years ago by
igor7.7k
United States
igor7.7k wrote:

You can do this with rtracklayer library:

library(rtracklayer)
gr_obj =  import("file.bed")
library(GenomicRanges)
gr_list = split(gr_obj, gr_obj$name)

More info here: https://www.bioconductor.org/packages/release/bioc/vignettes/rtracklayer/inst/doc/rtracklayer.pdf

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by igor7.7k
2
gravatar for bernatgel
4 months ago by
bernatgel1.9k
Barcelona, Spain
bernatgel1.9k wrote:

The function toGRanges from package regioneR will work both with a data frame or a file with a bed-like structure (it will internally call rtracklayer::import used by igor to actually import the data)

 library(regioneR)

dd <- toGRanges("data.bed")
dd <- split(dd, f = dd$name)
ADD COMMENTlink written 4 months ago by bernatgel1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 976 users visited in the last hour