I am trying to count the number of UMIs associated with reads that contain a particular SNP. So, count the number of different UMIs associated with reads that do and do not contain the SNP
I was thinking of first parsing through the processed bam files to extract all reads at a specific site
samtools view -b -o output.bam input.bam "1:1000-1000"
Then convert output.bam --> output.fastq to easily parse reads
And I know there are packages like UMI tools that can append cell and molecular barcodes to each read line in a fastq file. Counting the number of unique UMIs under a specific cell barcode could give me what I want. However, I feel like this is too convoluted. Any recommendations of how to more easily count the number of UMIs which contain a specific SNP? Pysam doesn't seem like it has the functionality I'm looking for.