Question

extract gene names from bigwig information

0

Entering edit mode

9.2 years ago

tomasscheitel ▴ 30

Hi,

I have some bigWig files downloaded from the Encode project for specific ChiP-Seq data of human genome.

I would like to somehow extract the genes, which overlap the regions in these bigWig files.

I know I can convert the bigWig files into Bedgraph using the UCSC tools and I have also found bwtools on Github which can also manipulates bigwig files (into bed too).

but What do I do next?

How do I compare the resulted bed (or Bedgraph) file with my annotation files?

Does anyone has an idea as to how this can be done?

Thanks in advance

Tomas

genome bigwig GenomicRanges • 4.6k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.2 years ago by tomasscheitel ▴ 30

Ram · Answer 1 · 2015-02-19

2

Entering edit mode

9.2 years ago

Alex Reynolds 35k

Consider using bigWigToWig to convert bigWig to wig, and then use wig2bed to convert to a sorted BED:

$ bigWigToWig data.bigWig data.wig
$ wig2bed < data.wig > data.bed

Once in BED format, you can use gtf2bed and bedmap to filter data against GENCODE or other gene annotations, e.g.:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gtf.gz \
    | gunzip -c - \
    | gtf2bed - \
    | grep -w gene - \
    > gencode.v21.genes.bed

Then:

$ bedmap --echo --echo-map-id-uniq data.bed gencode.v21.genes.bed > data_with_overlapping_gencode_gene_names.bed

Using the --echo-map-id-uniq operator adds a list of gene names for any data element that overlaps a gene by one or more bases.

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.2 years ago by Alex Reynolds 35k

0

Entering edit mode

Thanks for the fast response. Unfortunately this is not working as I have expected.

This is my wig file:

chr1    10451    10505    0
chr1    10505    10506    10.5
chr1    10506    10507    10.4
...

This is the bed file I get after using wig2bed (version 1.2.5b)

chr1    10450    10504    id-1    0.000000
chr1    10504    10505    id-2    10.500000
chr1    10505    10506    id-3    10.400000 
...

It all run w.o. errors, but the file I get at the end is this:

chr1    10450    10504    id-1    0.000000|
chr1    10504    10505    id-2    10.500000|
chr1    10505    10506    id-3    10.400000|
chr1    10506    10507    id-4    10.200000|
...

So am I correct in assuming that there are no overlaps? This seems a bit strange. Is there a way to test the results?

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.2 years ago by tomasscheitel ▴ 30

0

Entering edit mode

One way to test results is to echo ad hoc regions to bedmap directly, e.g. enter some chromosome chrN and start and stop positions X and Y where you know you should get overlap with genes:

$ echo -e 'chrN\tX\tY' | bedmap --echo --echo-map-id-uniq - gencode.v21.genes.bed

Another option is to load your WIG or BED data and the GENCODE annotations into tracks in a genome browser (like UCSC). This can provide exploration tools and visual confirmation of what you observe on the command line.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.2 years ago by Alex Reynolds 35k