Question

Bed File Of Reads To Wiggle File Of Numbers Of Features At Each Position

1

Entering edit mode

13.3 years ago

KCC ★ 4.1k

This is probably really simple, but how do you go from a BED file of genomic features to a wiggle file which contains the number of features at a given position on the chromosome? So, every position in the wiggle file gives the total number of features that overlap with that position in the chromosome.

wiggle bed • 4.4k views

ADD COMMENT • link updated 13.3 years ago by brentp 24k • written 13.3 years ago by KCC ★ 4.1k

score 4 · Answer 1 · 2012-04-06

4

Entering edit mode

13.3 years ago

brentp 24k

In additions to @David's solution, if your data is large, you can get directly to bigwig format by using

bedtools genomecov (genomeCoverageBed) and

then bedGraphToBigWig

ADD COMMENT • link 13.3 years ago by brentp 24k

0

Entering edit mode

By the way... is there also a possibility to distinguish between the strands? With my script it's possible, but absolutely not memory efficient for a whole genome, when working with NGS data... but with genomeCoverageBed it's not doable, is it?

ADD REPLY • link 13.3 years ago by David Langenberger 11k

0

Entering edit mode

David, genomeCoverageBed has a -strand option so you can specify to check reads only from a single strand. Even if it didn't, one could just use awk to filter to a specific strand, then pipe to bedtools.

ADD REPLY • link 13.3 years ago by brentp 24k

0

Entering edit mode

ah ok thx, didn't know that. But anyways, I always use a script, since I also want to normalize for multiple hits... I work with short RNAseq and there it happens a lot. And that is not possible with such tools.

ADD REPLY • link 13.3 years ago by David Langenberger 11k

score 3 · Answer 2 · 2012-04-06

I think the simplest way is to write a small perl-script...

Copy this to a file called something like bed2wig.pl:

use strict;
use warnings;

my $file = $ARGV[0];
my %hash = ();

open(FILE, "<$file") || die "cannot open $file\n";
while(<FILE>){
    chomp;
    my ($chr, $start, $end, $id, $score, $strand) = split(/\s+/,$_);
    for(my $i=$start; $i<=$end; $i++){
        $hash{$chr}{$i}++;
    }
}
close(FILE);

foreach my $chr (keys %hash){
    print "variableStep chrom=$chr\n";
    foreach my $i (sort {$a<=>$b} keys %{$hash{$chr}}){
        print "$i\t$hash{$chr}{$i}\n";
    }
}

Now call it like ./bed2wig.pl yourFile.bed > yourFile.wig

It's not very memory efficient for big files (because of the hash), but it should give you an idea of how to do it.