I want to be able to extract, from BAM files, the reads that, for a given position, contain the variant (and conversely but separately, the ones that contain the reference). I've played around with various tools and searched the tubes, without finding any convenient way to do this. It doesn't seem like too odd a task to perform, so maybe my google fu is just failing me here.
Another option, for querying any ad hoc range (
$ echo -e "chrN\tX\tY" | bedops --element-of 1 <(bam2bed < reads.bam) - > overlapping_reads.bed
To query any given set of genomic ranges in
$ bedops --element-of 1 <(bam2bed < reads.bam) variants.bed > overlapping_reads.bed
bam2bed tool will preserve all fields, which may be useful.
If variants are in VCF, then a second bash subshell to call
vcf2bed can convert them to BED, e.g.:
$ bedops --element-of 1 <(bam2bed < reads.bam) <(vcf2bed < variants.vcf) > overlapping_reads.bed
For a specific variant, you can use bamql
For instance, if you want to extract all reads from a bam file :
which support the alternate 'A' in snp : chr17:29827429 G>A
bamql -f yourfile.bam 'chr(17) & nt_exact(29827429,A)' -o A.bam
which support the reference 'G' in snp : chr17:29827429 G>A
bamql -f yourfile.bam 'chr(17) & nt_exact(29827429,G)' -o B.bam
I guess you can loop over variant file and do this for each variant. You can also filter with mate information. See documentation Here
In the off chance that you really do need to have the full alignments in SAM format in separate files, then you can code something up in either python with pysam or C with HTSlib. In both cases you can use the pileup functions to just grab all alignments that overlap a given position (or set of them) and then iterate over the results. The pileup functions will tell you which base of each alignment overlaps a given position, so you can check what genotype it supports and treat it accordingly.
Edit: But really, if you think you want the full alignments in separate files then you should probably rethink what you're doing. The two answers posted before mine are vastly simpler and should suffice for most normal needs.