Can anybody suggest how to write Granges object list to bed file?
Thanks a lot.
Can anybody suggest how to write Granges object list to bed file?
Thanks a lot.
Given a GRanges
object:
gr <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)),
strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)))
You can simply:
df <- data.frame(seqnames=seqnames(gr),
starts=start(gr)-1,
ends=end(gr),
names=c(rep(".", length(gr))),
scores=c(rep(".", length(gr))),
strands=strand(gr))
write.table(df, file="foo.bed", quote=F, sep="\t", row.names=F, col.names=F)
to write that to foo.bed
. The only trick is remembering the BED uses 0-based coordinates. If you have a GRangesList
rather than a GRanges
object, just use unlist(gr)
in place of gr
(things should still be in the same order).
Dear dpryan,
I have similar kind if Granges mentioned above:
GRanges with 2515 ranges and 6 metadata columns:
seqnames ranges strand | Conc
<Rle> <IRanges> <Rle> | <numeric>
851 chrI [15059848, 15071787] * | 16.4150832178115
1412 chrIII [ 249517, 252803] * | 9.93391864180872
3416 chrX [ 108921, 114715] * | 14.5870600573661
2224 chrIV [ 851252, 855627] * | 10.5743489064907
1604 chrIII [ 4431526, 4439773] * | 11.0537011405054
... ... ... ... ... ...
2453 chrIV [ 7011494, 7013670] * | 9.54973169373811
2897 chrV [ 1743061, 1744363] * | 8.42611771396342
3075 chrV [ 8460316, 8461383] * | 8.19169221695555
2163 chrIII [13529231, 13531151] * | 9.38284126048039
2655 chrIV [11863005, 11864250] * | 8.41453042457874
But with this i am finding difficult to convert to bed file? Can you suggest anything for this.
Thanks a lot for your help!
The following works for me:
$ cat foo.txt
name start stop Conc
chrI 15059848 15071787 16.4150832178115
chrIII 249517 252803 9.93391864180872
chrX 108921 114715 14.5870600573661
chrIV 851252 855627 10.5743489064907
chrIII 4431526 4439773 11.0537011405054
chrIV 7011494 7013670 9.54973169373811
chrV 1743061 1744363 8.42611771396342
chrV 8460316 8461383 8.19169221695555
chrIII 13529231 13531151 9.38284126048039
chrIV 11863005 11864250 8.41453042457874
chrIV 11863006 11864251 9.41453042457874
chrIV 11863007 11864252 7.41453042457874
And then in R:
library(GenomicRanges)
d <- read.delim("foo.txt", header=T)
gr <- GRanges(seqnames=Rle(d$name),
ranges = IRanges(d$start, end=d$stop),
strand = Rle(strand(c(rep("*", length(d$name))))),
Conc = d$Conc)
df <- data.frame(seqnames=seqnames(gr),
starts=start(gr)-1,
ends=end(gr),
names=c(rep(".", length(gr))),
scores=elementMetadata(gr)$Conc,
strands=strand(gr))
write.table(df, file="foo.bed", quote=F, sep="\t", row.names=F, col.names=F)
You might have to convert the "*" strands to ".", I don't recall off-hand what the BED format requires there.
Hi, It is an old thread, but I have a naive question about it that I couldn't find anywhere.
if one uses an input from UCSC and do:
makeGRangesFromDataFrame(UCSC_table,seqnames.field ="chr",start.field="Start",end.field="End", ignore.strand=T,starts.in.df.are.0based=TRUE,keep.extra.columns=TRUE)->UCSC_table_GR
If one uses starts.in.df.are.0based=TRUE , is it still necessary to use
starts=start(gr)-1
as explained in your comment?
Thank you
Great tip. I don't know if it is only me but I found that this grange to dataframe conversion method converted rounds numbers such as 1000000 in scientific notation (1e+6), which causes trouble in a bed file. My workaround was to impose non-scientific notation when getting the starts and ends variable:
df <- data.frame(seqnames=seqnames(gr),
starts=format(start(gr)-1, scientific=F),
ends=format(end(gr), scientific=F))
You can try the rtracklayer package. It gives you options to export in various formats including the bed format.
get a DataFrame object by mcols(gr)
and then write out.
If you only have one metadata column and you would like to keep it, this modification of Devon Ryan's answer works:
df <- data.frame(seqnames=seqnames(gr),
starts=start(gr)-1,
ends=end(gr),
names=c(rep(".", length(gr))),
scores=elementMetadata(gr)[,1],
strands=strand(gr)
For my data this gives:
seqnames starts ends names scores strands
1 chrY 10515750 10515760 . 1 *
2 chrY 10519610 10519620 . 1 *
3 chrY 10534770 10534780 . 1 *
4 chrY 10540160 10540170 . 1 *
5 chrY 10554860 10554870 . 1 *
6 chrY 10560630 10560640 . 1 *
This worked for me.
BiocManager::install("Repitools")
library('Repitools')
df <- annoGR2DF(gr)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Assuming below that the object
gr
is your GRanges object:Since this is pretty old I think the granges got updated, for me I just use
df=data.frame(gr@unlistData)
which gives a data frame:
your data frame may have more columns, my granges doesn't have anything else in it other than these columns.