Number Of Reads Over A Given Window And Over Given Regions Bam Files
1
1
Entering edit mode
11.8 years ago
dfernan ▴ 760

Hi,

I am trying to solve two problems that I recurrently have when working with BAM Files. I am mostly familiar with Python but other language solutions are welcome, specially if they are fast.

(1) readsoverwindow. I'd like to have a fast script that takes as an input a bam file and it outputs the number of reads for a given window across all the genome. I.e., the output should be a txt file for each chromosome containing the number of reads in each W kb window. Also when the data is chip-seq data I'd like to extend the regions and then have the counts.

I am looking for something of this kind: ./readsoverwindow.py chromsizes.bed reads.bam -w 1000 -e 164 --dir readsdir and it should output a chr1.txt,...,chrX.txt files with the number of reads for each chromosome for a 1000 bp window, reads extended 164 bp.

(2) readsoverbed. I'd like to compute the number of reads for each region defined in the bed file (total number, not average). Also, I'd like to be able to extend the reads by e number of base pairs.

I am looking for something of this kind: ./readsoverbed regions.bed reads.bam -e 164 --output regions.reads where the output is the total number of reads that map to a given region.

Please advise. Thank you very much!

bam python read • 6.3k views
ADD COMMENT
9
Entering edit mode
11.8 years ago
brentp 24k

Have you looked at bedtools? For (1), something like:

bedtools makewindows -g chromsizes.bed -w 10000 \
    | bedtools coverage -b - -abam input.bam

if you want to extend the reads, just do that with awk like this:

samtools view -F 4 input.bam  | awk 'BEGIN{FS=OFS="\t"}{ print $3, $4-168,$4 + 168 + length($10) }' > input.padded.bed

Then use that instead of input.bam in the first command above.

For 2, I think that bedtools coverage does exactly what you want.

ADD COMMENT
0
Entering edit mode

Hi thanks a lot! seems correct, just one more Q, how would I use awk to extend the reads from a bam file?

ADD REPLY
0
Entering edit mode

I added a note in the answer above about using awk.

ADD REPLY
0
Entering edit mode

wow, awesome! thx a lot brentp.

ADD REPLY
0
Entering edit mode

I know this is after 2.3 years but just want to document this approach to extend reads. It uses bedtools

bamToBed -i input.bam | slopBed -i - -g genome_file_of_chr_sizes -b 168 | bedToBam -i - -g genome_file_of_chr_sizes > output_extended.bam

Please let me know if there are overhead attached with this approach.

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6