How to bin a genome to different regions
1
0
Entering edit mode
3.5 years ago
Apex92 ▴ 280

Hi, this is my first time binning a genome, it would be great to get some help from you experts.

I have the mouse genome GRCm38.p6.genome.fa and I would like to bin this to ~2.5 million regions that are each 1000 nts long.

With a bit of online searching, I figured out probably I have to use bedtools but still not sure how should I set the parameters.

Any help will be appreciated.

rna-seq RNA-Seq genome sequencing binning • 1.9k views
ADD COMMENT
0
Entering edit mode

bedtools makewindows, please read its documentation and then state what is unclear.

ADD REPLY
0
Entering edit mode

Could you please provide a link? It seems that there is no detailed manual about the bedtools makewindows.

ADD REPLY
0
Entering edit mode

Type bedtools makewindows into your console and the help will pop up.

ADD REPLY
0
Entering edit mode
3.5 years ago

You can use BEDOPS with UCSC Kentutils:

$ fetchChromSizes mm10 | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 - > answer.bed

If you just want nuclear and mitochondrial chromosomes from mm10, add a grep in the middle:

$ fetchChromSizes mm10 | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | grep -v "_" | sort-bed - | bedops --chop 1000 - > answer.bed

The file answer.bed will be a properly-sorted BED file containing your bins.

Change 1000 to whatever window size that you want.

Because genomes will not divide evenly by 1000 bases, you can add -x to the bedops --chop 1000 statement if you want to leave out the trailing bin for each chromosome, which will usually be less than 1000 bases long.

ADD COMMENT
0
Entering edit mode

Dear @Alex thank you for your detailed answer. I was wondering whether it will be possible to convert the binned genome to a gtf file. I have my purpose of binning the genome here in this link C: How to make a customized gtf file for differential expression purposes

ADD REPLY
0
Entering edit mode

BED is generally an unbound format after the third column, with some very specific exceptions, so BED-to-GTF is not straightforward. Usually you can go the other way, however; see gtf2bed for an example.

That said, going from bins in a BED file to GTF doesn't really make sense. GTF is used to represent gene annotation data. A bin is just an (empty) genomic interval that would first need annotation and curation steps to get to something that could be turned into sensible GTF.

Maybe take a look at GenePred to GTF conversion.

Alternatively, maybe tell us what you're really trying to do. Using BEDOPS/Kentutils might be the right answer, but it depends on what you're trying to do.

ADD REPLY
0
Entering edit mode

Basically, my purpose is to make my own gtf file to be able to find the differentially expressed regions in the genome. So that is the reason I want to do the binning and then make a gtf file out of it.

ADD REPLY
0
Entering edit mode

You can use the bin file directly as a BED file, using mapping operations via bedmap (https://bedops.readthedocs.io/en/latest/content/reference/statistics/bedmap.html). Conversion to GTF is unnecessary.

ADD REPLY

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6