has anyone created a script to make a regions file for CRISPressoWGS from the covered regions of a BAM file ?
It would be great if something is available, else I'll write one myself.
Background : this tool needs a regions file (extended, tab separated BED file with unique region IDs). Regions must be shorter than read length (or perhaps PE read length, so roughly fragment length, to be counted).
So I solved this myself, it wasn't too tricky after all. Why not reinvent the wheel eh ?
Sambamba bash script
#!/bin/bash
# Check window coverage on Wochenende sorted dup.bam output
# Actually run for each BAM file
for i in *dup.bam; do
input=$i
sec_input=${input%%.bam}
#sec_in_bam=${input%%.bam}
window=150
overlap=75
covMax=999999999
# Get coverage depth in windows
# x threads, Windows 100000, overlap 50000, -c minimum coverage.
# SLURM
$scheduler $path_sambamba depth window -t $THREADS --max-coverage=$covMax --window-size=$window --overlap $overlap -c 10 ${sec_input}.bam > ${sec_input}_cov_window.txt &
done
wait
Followup awk script
cat awk_crispressoWGS_regions.sh
#/bin/bash
# Feb 2022, make regions for https://github.com/pinellolab/CRISPResso2 wgs
#example target format tab separated
#chr6 51002798 51002820 R2 NA NA NA
awk -F "\t" '{print $1 "\t" $2 "\t" $3 "\tR" FNR "\tNA" "\tNA" "\tNA"}' yourBAMfilename.trm.ns.fix.s.dup_cov_window.txt > tmp.txt
# remove header on line 1
grep -v "#" tmp.txt > WGS_regions.txt