Question: How to generate start & end coordinates for each chromosome of human ref genome in bed. format (0-based index)
0
gravatar for ekwame
4 months ago by
ekwame40
NYC
ekwame40 wrote:

Dear Biostars,

My aim is to generate start and end cordinates for each chromosome in human reference genome to use for other downstream analysis. so using R packages BSgenome and GenomicFeatures I generated the cordinates of hg38 genome and saved as .bed file. However, i think my cordinates (.bed) i generated are inaccurate because .bed files are 0-based index

Question; Any susggestions on how to generate a 0-based index .bed file for chromosomes of human reference genome?

codes used

################begingining of codes#################
library(BSgenome)
library(GenomicFeatures)
library(GenomeInfoDbData)


# download a local copy of BSgenome.Hsapiens.UCSC.hg38 using bioconductor
if (interactive()) {
  if (!require("BiocManager"))
    install.packages("BiocManager")
  BiocManager::install("BSgenome.Hsapiens.UCSC.hg38")
}

##load BSgenome.Hsapiens.UCSC.hg38 on current workspace
library(BSgenome.Hsapiens.UCSC.hg38)

##assign hg38 genone to `hg38` variable to save on typing the long name
hg38 <- getChromInfoFromUCSC("hg38")

##get cordinates of hg38 in GRaanges formart
hg38.cordinates <- GRanges(seqnames = hg38[,1],ranges=IRanges(end=hg38[,2],width = hg38[,2]))
ranges(hg38.cordinates)

###save as df and write out as a bed file
hg38.cordinates.dt <- cbind(hg38[,1],as.data.frame(ranges(hg38.cordinates)[,c(1,2)]))
dim(hg38.cordinates.dt)

write.table(file="/path/to/write_to/hg38.cordinates.bed",hg38.cordinates.dt,sep="\t",quote=F,row.names=F, col.names = c("chr","start","end","width"))

##########################end of codes################################
ADD COMMENTlink modified 4 months ago by Alex Reynolds30k • written 4 months ago by ekwame40
1
gravatar for Alex Reynolds
4 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Instead of using R, I suggest using the UCSC Kent Utilities kit. It includes a binary called fetchChromSizes that is perfect for this task. The link will take you to folders you can pick from for your platform (Linux or OS X) containing pre-compiled binaries, so it is a quick download.

You give fetchChromSizes the name of your assembly, and it gives you a list of chromosomes and their sizes. This can be piped to other tools to make a BED file easily and quickly.

For instance, to make a sorted, 0-indexed BED file:

$ fetchChromSizes hg38 | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2; }' | sort-bed - > hg38.all.bed

If you want to remove non-nuclear chromosomes (partial chromosomes and pseudochromosomes), you can add a grep statement in between to filter them out:

$ fetchChromSizes hg38 | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2; }' | grep -v ".*_.*" | sort-bed - > hg38.nuclearOnly.bed

This also makes use of BEDOPS sort-bed, for creating a sorted BED file, which is more efficient for use with set operations.

ADD COMMENTlink written 4 months ago by Alex Reynolds30k
1

This is great! Thanks!

ADD REPLYlink written 4 months ago by ekwame40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour