extract gene names from bigwig information
1
0
Entering edit mode
9.2 years ago

Hi,

I have some bigWig files downloaded from the Encode project for specific ChiP-Seq data of human genome.

I would like to somehow extract the genes, which overlap the regions in these bigWig files.

I know I can convert the bigWig files into Bedgraph using the UCSC tools and I have also found bwtools on Github which can also manipulates bigwig files (into bed too).

but What do I do next?

How do I compare the resulted bed (or Bedgraph) file with my annotation files?

Does anyone has an idea as to how this can be done?

Thanks in advance

Tomas

genome bigwig GenomicRanges • 4.6k views
ADD COMMENT
2
Entering edit mode
9.2 years ago

Consider using bigWigToWig to convert bigWig to wig, and then use wig2bed to convert to a sorted BED:

$ bigWigToWig data.bigWig data.wig
$ wig2bed < data.wig > data.bed

Once in BED format, you can use gtf2bed and bedmap to filter data against GENCODE or other gene annotations, e.g.:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gtf.gz \
    | gunzip -c - \
    | gtf2bed - \
    | grep -w gene - \
    > gencode.v21.genes.bed

Then:

$ bedmap --echo --echo-map-id-uniq data.bed gencode.v21.genes.bed > data_with_overlapping_gencode_gene_names.bed

Using the --echo-map-id-uniq operator adds a list of gene names for any data element that overlaps a gene by one or more bases.

ADD COMMENT
0
Entering edit mode

Thanks for the fast response. Unfortunately this is not working as I have expected.

This is my wig file:

chr1    10451    10505    0
chr1    10505    10506    10.5
chr1    10506    10507    10.4
...

This is the bed file I get after using wig2bed (version 1.2.5b)

chr1    10450    10504    id-1    0.000000
chr1    10504    10505    id-2    10.500000
chr1    10505    10506    id-3    10.400000 
...

It all run w.o. errors, but the file I get at the end is this:

chr1    10450    10504    id-1    0.000000|
chr1    10504    10505    id-2    10.500000|
chr1    10505    10506    id-3    10.400000|
chr1    10506    10507    id-4    10.200000|
...

So am I correct in assuming that there are no overlaps? This seems a bit strange. Is there a way to test the results?

ADD REPLY
0
Entering edit mode

One way to test results is to echo ad hoc regions to bedmap directly, e.g. enter some chromosome chrN and start and stop positions X and Y where you know you should get overlap with genes:

$ echo -e 'chrN\tX\tY' | bedmap --echo --echo-map-id-uniq - gencode.v21.genes.bed

Another option is to load your WIG or BED data and the GENCODE annotations into tracks in a genome browser (like UCSC). This can provide exploration tools and visual confirmation of what you observe on the command line.

ADD REPLY

Login before adding your answer.

Traffic: 1605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6