Question: extract gene names from bigwig information
0
gravatar for tomasscheitel
4.7 years ago by
European Union
tomasscheitel20 wrote:

Hi,

 

I have some bigWig files downloaded from the Encode project for specific ChiP-Seq data of human genome.

I would like to somehow extract the genes, which overlap the regions in these bigWig files.

I know I can convert the bigWig files into Bedgraph using the UCSC tools and I have also found bwtools on Github which can also manipulates bigwig files (into bed too).

but What do I do next?

How do I compare the resulted bed (or Bedgraph) file with my annotation files?

Does anyone has an idea as to how this can be done?

thanks in advance

Tomas

 

bigwig genomicranges genome • 2.1k views
ADD COMMENTlink modified 4.7 years ago by Alex Reynolds29k • written 4.7 years ago by tomasscheitel20
1
gravatar for Alex Reynolds
4.7 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Consider using bigWigToWig to convert bigWig to wig, and then use wig2bed to convert to a sorted BED:

$ bigWigToWig data.bigWig data.wig
$ wig2bed < data.wig > data.bed

Once in BED format, you can use gtf2bed and bedmap to filter data against GENCODE or other gene annotations, e.g.:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gtf.gz \
    | gunzip -c - \
    | gtf2bed - \
    | grep -w gene - \
    > gencode.v21.genes.bed

Then:

$ bedmap --echo --echo-map-id-uniq data.bed gencode.v21.genes.bed > data_with_overlapping_gencode_gene_names.bed

Using the --echo-map-id-uniq operator adds a list of gene names for any data element that overlaps a gene by one or more bases.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Alex Reynolds29k

thanks for the fast response. unfortunately this is not working as I have expected.

This is my wig file:

chr1    10451    10505    0
chr1    10505    10506    10.5
chr1    10506    10507    10.4
...

This is the bed file I get after using wig2bed (version 1.2.5b)

chr1    10450    10504    id-1    0.000000
chr1    10504    10505    id-2    10.500000
chr1    10505    10506    id-3    10.400000 
...

It all run w.o. errors, but the file I get at the end is this:

chr1    10450    10504    id-1    0.000000|
chr1    10504    10505    id-2    10.500000|
chr1    10505    10506    id-3    10.400000|
chr1    10506    10507    id-4    10.200000|
...

So am I correct in assuming that there are no overlaps? This seems a bit strange. Is there a way to test the results?

ADD REPLYlink written 4.7 years ago by tomasscheitel20

One way to test results is to echo ad hoc regions to bedmap directly, e.g. enter some chromosome chrN and start and stop positions X and Y where you know you should get overlap with genes:

$ echo -e 'chrN\tX\tY' | bedmap --echo --echo-map-id-uniq - gencode.v21.genes.bed

Another option is to load your WIG or BED data and the GENCODE annotations into tracks in a genome browser (like UCSC). This can provide exploration tools and visual confirmation of what you observe on the command line.

ADD REPLYlink written 4.7 years ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2374 users visited in the last hour