Question: Extract reads matching variants from VCF file
0
gravatar for sppearce
17 months ago by
sppearce10
United Kingdom
sppearce10 wrote:

I have a vcf file describing a set of SNPs, insertions and deletions. I would like to extract those reads from the bam file which correspond to these variants for some quality control.

For SNPs, I can loop through using samjs and produce one new bam file per SNP, but I don't know how to generalise this for indels (and I want those specific ones that match the vcf file, not just any indel at that base).

Is there any existing tool that will let me do this? Or a way to get samjs to do this.

snp vcf • 761 views
ADD COMMENTlink written 17 months ago by sppearce10
1

it's the same thing, you 'just' have to test for the cigar elements 'D/N' or 'I'.

ADD REPLYlink written 17 months ago by Pierre Lindenbaum131k

That presumably would get me any indel that exists covering the start position, rather than the specific indel in the vcf (I realise that is probably pretty niche).

ADD REPLYlink written 17 months ago by sppearce10
1

CIGAR string does not only contain the operations (D, I, etc) but also the length of the operation. By comparing the length with the indel size in VCF, maybe you would be able to count the reads that support the specific indel in question. I usually use pysam for this kind of CIGAR queries.

https://pysam.readthedocs.io/en/latest/

ADD REPLYlink written 17 months ago by Vitis2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1901 users visited in the last hour