Question: Finding reads having more than one mutation in BAM file
gravatar for marki
2.7 years ago by
European Union
marki60 wrote:


I have a set of BAM files (pair-end reads) from which I want to identify reads having more than one mutation. I already have the mutation calls for the same data. I am thinking of the following steps to achieve this.

  1. Identify pairs of "nearby" mutations (based on a maximum distance, say 400-500).
  2. Construct a BED file from above containing chromosome, start and end information, where 'start' will be the first and 'end' will be the second of above pair.
  3. Find intersecting regions between above BED and BAM file to filter putative reads.
  4. Query each read for presence of mutation pair

I have two questions. Firstly, I am wondering if I have to somehow take the strand information into account while finding intersected regions (I don't have the strand information in the variant-calling file)? Secondly, is there an easier alternative that I may use for this analysis? Apologies if some of the info is not-correct/too-basic as I am newbie to this kind of analysis.



bam next-gen exome-sequencing R • 970 views
ADD COMMENTlink modified 2.7 years ago by WouterDeCoster38k • written 2.7 years ago by marki60

If you know a bit of python, its fairly simple with pysam which supports reading bam and vcf file.

For each SNP ( as you already have the calls), find the overlapping reads ( with pileup function ) and check if the base on read matched with the ref/atl allele, and push the read names to a list/dictionary. Then check if you encounter the same read again for another SNP. There could be other optimised ways.

Note: Be careful that both the read and its mate has the same name.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by geek_y9.4k
gravatar for swbarnes2
2.7 years ago by
United States
swbarnes25.2k wrote:

A few alternate paths:

If you don't have to do this for a lot, you could just eyeball in IGV.

If that's not feasible, another way

1) pick a one allele at a variant site 2) identify the reads with the desired allele in one read 3) pick out the mates for each of those reads 4) realign 5) assess which other variant sites are now covered 6) asses the alleles present at those sites

Another way, if the variant frequency isn't too large:

1) make short reference sequences containing each possible permutation (i.e if there are 3 mutations together in your window, that's 8 possible sequences) 2) align to those short sequences 3) see which permutations have reads aligning to them.

ADD COMMENTlink written 2.7 years ago by swbarnes25.2k
gravatar for WouterDeCoster
2.7 years ago by
WouterDeCoster38k wrote:

I would have a look at the CIGAR string of your mapped reads and look for > 1 mismatches/insertions/deletions

ADD COMMENTlink written 2.7 years ago by WouterDeCoster38k

In a CIGAR string, you can't see mismatches, only indels. Plus, there will be a lot of one-off errors to sift through.

ADD REPLYlink written 2.7 years ago by swbarnes25.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 883 users visited in the last hour