Question: Going From Cigar String In Sam To Genomic Coordinates?
4
gravatar for User 9996
9.4 years ago by
User 9996800
User 9996800 wrote:

How can I go from a CIGAR string, given in the SAM output format, to a set of start/end genomic coordinates for paired-end sequences? The SAM format gives the start coordinate but I need to find the end coordinate as well. Thanks.

parsing sam alignment cigar • 7.1k views
ADD COMMENTlink modified 14 months ago by RamRS24k • written 9.4 years ago by User 9996800
8
gravatar for Pierre Lindenbaum
9.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

Have a look at the Cigar.java source code in Picard. For example:

/**
 * @return The number of reference bases that the read covers, including padding.
 */
public int getPaddedReferenceLength() {
    int length = 0;
    for (final CigarElement element : cigarElements) {
        switch (element.getOperator()) {
            case M:
            case D:
            case N:
            case EQ:
            case X:
            case P:
                length += element.getLength();
        }
    }
    return length;
}
ADD COMMENTlink modified 14 months ago by RamRS24k • written 9.4 years ago by Pierre Lindenbaum124k
2

Once you have length, the position of the last base is: end = start + length - 1. Don't forget the minus one, since we're "counting fenceposts" (google it).

ADD REPLYlink written 9.3 years ago by Jonathan Manning640
4
gravatar for Aaronquinlan
9.3 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

The bamToBed program in BEDTools will do this by create BED entries for each BAM alignment where the end end coordinate reflects the CIGAR string. Moreover, if you want to create separate BED entries for "spliced" or "split" alignments (i.e., when there is an "N: CIGAR op present), use the -split option.

ADD COMMENTlink modified 14 months ago by RamRS24k • written 9.3 years ago by Aaronquinlan11k
2

Just remember, BED entries are 0-indexed, half-open ranges (up to but not including the end position). A 5 bp read starting at (1-based) position 10 would be "chr1 9 15" in BED.

ADD REPLYlink written 9.3 years ago by Jonathan Manning640

Hi

I am trying the split option on my bam file and it is only reporting 6 fields in the output and that too for all the alignments. Can you suggest what is going wrong??

ADD REPLYlink written 6.9 years ago by Varun Gupta1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1694 users visited in the last hour