Question: Density Of 5' End Of Reads From Bam Averaged Over Bed File
0
gravatar for random5196648
6.4 years ago by
random51966480 wrote:

Hi guys,

I'm trying to plot the average density of 5' ends of reads from a BAM file, for regions from a BED file. I expect the density to be average over the middle point of BED files, and the 2 strands plotted separately.

So far it doesn't seem the standard BedTools, BEDOPS or other tools (ngs.plot) are doing what I need.

Anyone have an easy solution? Or even an R package I could look at the build something custom if needed?

Thanks!

bed bam chip-seq • 1.8k views
ADD COMMENTlink modified 6.4 years ago by Alex Reynolds28k • written 6.4 years ago by random51966480
4
gravatar for Alex Reynolds
6.4 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Process your bam file into a sorted BED file with your 5' ends, using BEDOPS bam2bed:

$ bam2bed < foo.bam \
    | awk '{ \
        if ($6 == "+") { \
            print $1"\t"$2"\t"($2+1)"\t"$4"\t"$5"\t"$6; \
        } \
        else { \
            print $1"\t"($3+1)"\t"($3+2)"\t"$4"\t"$5"\t"$6; \
        } \
    }' - \
    > sortedFivePrimeEnds.bed

(If you're using BEDOPS v2.1, then it is not necessary to post-process with BEDOPS sort-bed. If you're using an older installation, pipe to sort-bed before piping to awk.)

Now use BEDOPS bedmap to count 5' ends within regions. We assume unsortedRegions.bed is an unsorted, four-column BED file containing the coordinates of the region and an ID describing the region. In case there are more columns, I use cut to select the first four columns:

$ sort-bed unsortedRegions.bed | cut -f1-4 - > sortedRegions.bed
$ bedmap --echo --count --delim '\t' sortedRegions.bed sortedFivePrimeEnds.bed \
    > countsOfFivePrimeEndsOverRegions.bed

If your regions do not have ID values, use awk to add a placeholder ID value (e.g., id-123, etc.) in the sorted output from sort-bed:

$ sort-bed unsortedRegions.bed \
    | awk ' \
        BEGIN { idx = 0; } \
        { \
            print $1"\t"$2"\t"$3"\tidx-"idx; \
            idx++; \
        } \
    ' - \
    > sortedRegions.bed

In any case, as sortedRegions.bed is a four-column BED file, and because we use the --delim '\t' operator with bedmap, the result file called countsOfFivePrimeEndsOverRegions.bed will contain the number of ends in the fifth (score) column.

Now use awk to measure the average count over each region:

$ awk ' \
    { \
        regionStop = $3; \
        regionStart = $2; \
        regionWidth = regionStop - regionStart; \
        endCount = $5; \
        averageEnds = endCount / regionWidth; \
        print $1"\t"$2"\t"$3"\t"$4"\t"averageEnds; \
    }' countsOfFivePrimeEndsOverRegions.bed \
    > averageEndsOverRegions.bed

The average ends result is in the fifth (score) column of averageEndsOverRegions.bed.

Once you have an understanding of the process, it is possible to glue together most of these commands into a pipeline with standard UNIX pipes, which will improve speed greatly by removing the expense of creating intermediate files.

If you need to do strand-separation, change the initial bam2bed step appropriately:

$ bam2bed < foo.bam \
    | awk '{ \
        if ($6 == "+") { \
            print $1"\t"$2"\t"($2+1)"\t"$4"\t"$5"\t"$6; \
        } \
    }' - \
    > sortedForwardStrandFivePrimeEnds.bed

$ bam2bed < foo.bam \
    | awk '{ \
        if ($6 == "-") { \
            print $1"\t"($3+1)"\t"($3+2)"\t"$4"\t"$5"\t"$6; \
        } \
    }' - \
    > sortedReverseStrandFivePrimeEnds.bed

Then do the follow-up bedmap and awk processing steps on each file separately, as described.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Alex Reynolds28k
1
gravatar for Pierre Lindenbaum
6.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

use bed tools bamtobed to convert your bam to bed and pipe into awk to only keep the 3' position

 bamToBed -i reads.bam  | awk '{printf("%s\t%d\t%d\n",$1,int($3)-1,int($3) );}

to split by strand, use samtools view with option -f or -F and the sam FLAG =16 ( read reverse strand)

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 886 users visited in the last hour