Question: Library To Parse Absolute Reference Positions Of Indels From A Bam Cigar String
1
gravatar for William
5.9 years ago by
William4.4k
Europe
William4.4k wrote:

Is there a library or piece of code that I can use to parse the absolute reference positions of large indels ( +100 bp ) from a cigar string like one shown below? I looked in Picard but I couldn't find it, maybe I missed it? It would be nice to have the code in java.

I could try to write something myself but there is a ton of tools that use bam files and they all must be able to parse a bam file and the cigar string. But I can't find a library or piece of code that I can reuse to parse absolute locations of indels from the cigar string.

The cigar string shown below is from the actual alignment of a bac contig with bwa-mem against a reference. I would like to use bac alignments like one to extract a truth set of large indels. To see how good my NGS data based SV calling is.

bac_contig1   2064    1       13884324        60      1330M4D814M4D2891M1D82M136I973M6I2277M1D3M5D2M4D8M4I12M124D5M30D5M53D6M84D4M156D4M12D11M206D10M435D5M16D6M27D3M72D6M19D4M73D5M254D3M103D5M29D7M898D23M6D202M24D37M2D1341M16I2454M3D1293M11I8M1D9M30D2M3D24M7D5M18D3M8D10M4D4M1D19M1I5M7D10M9D4M2D4M7D13M5I5M13D7M2D5M10I5M31D14M27D12M15D6M11D12M4D7M3D3M1I1M3D9M2D6M1D8M2I10M7D3M10D6M4I21M2I2M3D4M5I6M9I7M1D6M6D11M5D3M8D6M9D7M20D4M1I16M68D5M61D9M25D7M6D5M2D2M2I3M1D7M7D4M14D5M14D4M1I6M18D3M8D9M31D11M3D5M53D3M12D7M22D3M5D8M17D6M1I4M1I3M5D9M129D4M1I5M73D888M6I1589M2I1156M2I4702M1D649M2I2957M12I1730M1D3647M6D2438M1I1614M28D220M8I4424M882I5215M1D1867M1I457M9I3621M2I59M4D996M3D852M1I372M136921H
sv api picard bam • 2.4k views
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 5.9 years ago by William4.4k
2
gravatar for William
5.9 years ago by
William4.4k
Europe
William4.4k wrote:

Turns out to be quite easy with the picard API. Just had to add some extra code.

 File inputBam = new File("/home/william/BACS.bam"); 

    SAMFileReader bamreader = new SAMFileReader(inputBam);

    for(SAMRecord samRecord : bamreader)
    {
        //skip alignment with low mapping qual
        if(samRecord.getMappingQuality() < 60 ){continue;}

        //get the start of the alignment
        String currentChrom = samRecord.getReferenceName();
        Integer currentReferencePos = samRecord.getAlignmentStart();

        for (CigarElement cigarElement :samRecord.getCigar().getCigarElements())
        {
            if(cigarElement.getOperator() == CigarOperator.D)
            {
                Integer cigarElementStart = currentReferencePos;
                Integer cigarElementLenght = cigarElement.getLength();
                Integer cigarElementEnd = currentReferencePos+cigarElementLenght;

                if(cigarElementLenght > 100 )
                {
                    System.out.println("deletion found from " + currentChrom +":"  + cigarElementStart + "-" + cigarElementEnd );
                }  
            }

            //add the lenght of this cigarElement to the current pos to go to the start of the next element. Only if the cigar element consumes reference bases 
            if(cigarElement.getOperator().consumesReferenceBases())
            {
                 currentReferencePos = currentReferencePos + cigarElement.getLength();
            } 
        }            
    }
ADD COMMENTlink written 5.9 years ago by William4.4k
2
gravatar for Andreas
5.9 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

You might want to check out Pysam's AlignedRead class. It has a very useful property called aligned_pairs, which will generate a list of aligned read and reference positions for your read based on its CIGAR string.

Andreas

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Andreas2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1550 users visited in the last hour