chromosome binning to average expression of genes in each bin
1
1
Entering edit mode
8.1 years ago
cg ▴ 10

Hi all

In an analysis framework, I need to calculate average expression of all genes within each chromosomal bin (let say size of each bin is 10kb). Doing this in bash using gff3 file is easy, but genes which overlap the ends of two consecutive bins are difficult to place. Can anyone point me to a tool/perl script that will manage to do this?

Thanks.

Microarray genes perl binning • 2.4k views
ADD COMMENT
0
Entering edit mode

Do you want bin the genome in to 10kb or bin the gene coordinates ?

ADD REPLY
0
Entering edit mode

I want to bin each chromosome in 10 kb bins and pull all the genes within each bin

ADD REPLY
4
Entering edit mode
8.1 years ago

One possible workflow:

  1. Convert genomic annotations from GFF or GTF to BED format via gff2bed or gtf2bed, using grep to filter for genes
  2. Add expression data to score column of BED-formatted gene annotations
  3. Download or create chromosome extents for your genome of interest. For example, for hg19: hg19.extents.bed
  4. Bin the extents by the desired 10kb increment with bedops --chop 10000
  5. Pipe the bins to bedmap --echo --mean --fraction-map 0.51 to require that a gene overlaps a bin by 51% or more, in order to be mapped; the --mean option calculates the mean expression score value of any elements that map to the bin

Something greatly simplified as:

$ gff2bed < annotations.gff | grep -w 'gene' | cut -f1-4 - | paste - expression.txt > genes.bed5
$ bedops --chop 10000 hg19.extents.bed | bedmap --echo --mean --fraction-map 0.51 - genes.bed5 > answer.bed

A 51% threshold ensures that a gene that straddles two disjoint, adjacent bins will be mapped only to one or the other bin, depending on which gene subsegment has the greater overlap.

ADD COMMENT
0
Entering edit mode

This works. Sorry I forgot to upvote!

ADD REPLY

Login before adding your answer.

Traffic: 3085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6