Python script for generating Genomic windows of intervals 50kbp for Human reference genome Hg19
3
0
Entering edit mode
5.4 years ago
nk778 • 0

Hello all, Where can I download all the chromosomal coordinates for the entire hg19 reference genome? Additionally, can anyone help me generate a python script that will split the whole genome into 50kbp windows? My draft python code is given below

step = 50000 with open('Chr.bed', 'w') as outfp: for val in range(0, 300000000, step):
outfp.write('{0}\t{1}\t{2}\n'.format('chr', val, val+step))

I would appreciate any help and suggestions. Thank you

genome Assembly • 2.2k views
ADD COMMENT
0
Entering edit mode

Thank you for your suggestions! I will check out bedtools for making the genomic windows. Thank you once again

ADD REPLY
0
Entering edit mode

Thank you all for your expert insights! Bedtools and bedops are a better option than just writing my own python script. My purpose of generating these windows is to find the CNV frequency in each genomic intervals for 200,000 CNV coordinates in my dataset. Bedtools intersect function and -c flag seems a good option to find the overlap of CNV coordinates within those windows. The end goal is to see which interval or coordinates of genomes do not have any CNVs at all. And thank you once again guys for your expert insights. Really appreciate it.

ADD REPLY
1
Entering edit mode

If you want bins without CNVs:

$ fetchChromSizes hg19 | awk '{print $1"\t0\t"$2}' | sort-bed - | bedops --chop 50000 -x - > hg19.bins.bed
$ bedmap --count --echo --delim '\t' hg19.bins.bed cnvs.bed | awk '($1 == 0)' | cut -f2- > hg19.bins.cnvLess.bed

That awk bit can be adjusted to get bins with varied numbers of CNVs.

ADD REPLY
0
Entering edit mode

Thank you so much for your help. Realy appreciate it. I will give it a try. Thanks once again

ADD REPLY
2
Entering edit mode
5.4 years ago
ATpoint 81k

The genome is available from e.g. UCSC and Ensembl and only a google search away plus there are plenty of questions here already on that matter, please use the search function. For the binning, take a look at bedtools makewindows. Not python but hey, why reinventing the wheel if a superfast option already exists.

ADD COMMENT
2
Entering edit mode
5.4 years ago

You can use Kent tools and BEDOPS bedops --chop with standard Unix streams to do this quickly and efficiently:

$ fetchChromSizes hg19 | awk '{print $1"\t0\t"$2}' | sort-bed - | bedops --chop 50000 - > hg19.50k.bed

Note that the bounds of chromosomes will almost certainly not split evenly by 50kb units. In other words, chopping up a chromosome will leave some chunk at the end that is less than the desired length.

If you're doing calculations that involve aggregation, you may want to leave out that last < 50kb bin, by adding the -x operator to bedops --chop:

$ fetchChromSizes hg19 | awk '{print $1"\t0\t"$2}' | sort-bed - | bedops --chop 50000 -x - > hg19.50k.chunkless.bed

As a bonus, the output is sorted and ready to use for fast set operations with BEDOPS.

ADD COMMENT
2
Entering edit mode
5.4 years ago

for hg19 chromosome sizes:

https://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes

for making windows of 10kb:

$ wget -np -r -nd https://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes
$ bedtools makewindows -g hg19.chrom.sizes -w 10000

try using pybedtools. https://daler.github.io/pybedtools/

ADD COMMENT
0
Entering edit mode

Thank you once again! Your suggestions really helped!

ADD REPLY

Login before adding your answer.

Traffic: 1477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6