Hi,
I am trying to solve two problems that I recurrently have when working with BAM Files. I am mostly familiar with Python but other language solutions are welcome, specially if they are fast.
(1) readsoverwindow. I'd like to have a fast script that takes as an input a bam file and it outputs the number of reads for a given window across all the genome. I.e., the output should be a txt file for each chromosome containing the number of reads in each W kb window. Also when the data is chip-seq data I'd like to extend the regions and then have the counts.
I am looking for something of this kind: ./readsoverwindow.py chromsizes.bed reads.bam -w 1000 -e 164 --dir readsdir and it should output a chr1.txt,...,chrX.txt files with the number of reads for each chromosome for a 1000 bp window, reads extended 164 bp.
(2) readsoverbed. I'd like to compute the number of reads for each region defined in the bed file (total number, not average). Also, I'd like to be able to extend the reads by e number of base pairs.
I am looking for something of this kind: ./readsoverbed regions.bed reads.bam -e 164 --output regions.reads where the output is the total number of reads that map to a given region.
Please advise. Thank you very much!
Hi thanks a lot! seems correct, just one more Q, how would I use awk to extend the reads from a bam file?
I added a note in the answer above about using awk.
wow, awesome! thx a lot brentp.
I know this is after 2.3 years but just want to document this approach to extend reads. It uses bedtools
Please let me know if there are overhead attached with this approach.