I am analyzing 96 samples sequenced using NextSeq PE 150 reads. The goal is to study variants and per exon coverage of 10 genes. I have obtained the bed file for all the exonic regions of the 10 genes from UCSC genome browser. I am using trim_galore -> bowtie2 -> samtools pipeline for calling variants. I am using hg38 human genome assembly for alignment.
I calculated per exon coverage using bedtools intersect command as follows:
bedtools coverage -abam Sample-1.sorted.bam -b sorted.intervals.bed -counts > PerExonCoverage.txt
This gives me an output file in following format:
Chromosome Start End Coverage chr1 161677476 161677553 6561 chr20 1895448 1895526 2377 chr20 1915099 1915455 12081
I want to know the coverage of each nucleotide position for all exons for e.g as follows:
chr1 161677476 13 chr1 161677477 120 chr1 161677478 130
and so on..
I have an intermediate mpileup file which I have used to extract coverage of all positions like this:
cut -f1-4 Sample-1.mpileup > Sample-1.depth
This gives me coverage for all positions of the reference genome and not for the positions of selected exons.
The above command also results in a file size of almost 60GB per sample which is very huge for the resources available to me.
Is there a way in which I can get the coverage of all positions but only for exons listed in BED using either mpileup or BAM file.
Help will be appreciated. Thanks in advance!!