Hi, I have been struggling with analyzing chromosome conformation capture (3C) data for almost 6 months now. We have BAM files containing the aligned reads to the reference genome, and we are using a custom python script to extract reads containing SNPs so that we can analyze the information in an allele-specific manner. I don't think understanding the molecular biology of what we're doing is important, because all the issues are technical in nature.
Basically what is happening is that we at first could not analyze half the samples using the script, and these samples gave an error message that says "IndexError: string index out of range", after running the script to analyze the bam files.
We were able to fix this issue, such that all samples could be successfully analyzed, by adding a new test to the script (by test I mean a filter/requirement that a read must pass in order to not be thrown out).
The new test requires that the alignment length matches the inferred alignment length from the CIGAR string. This script was written completely by a collaborator, so we don't know if this test makes sense, or why it should allow us to analyze the problematic BAM files. However, I think the basic idea is just to make sure that the alignments are trustworthy.
The problem, however, is that for the majority of alignments, the CIGAR string appears to not have enough information to correctly infer the alignment length, and when this happens, the CIGAR string spits out 125bp as the default inferred alignment length, because 125bp is the READ length we used in the experiment. Since the vast majority of reads have soft clipping, less than 125bp are included in the alignment, and because this disagrees with the prediction of the CIGAR string, the reads get tossed.
When we remove this alignment length test from the script we recover all the reads that we have lost, but we go back to the original problem of not being able to analyze several of the files due to the IndexError message shown above. This leads me to believe that something is wrong with the reads and/or alignments in these samples, but I'm not sure exactly what to look for or how to do it.
Any help would be greatly appreciated!!!!!!