Write SAM file manually with pysam
Entering edit mode
10 months ago
mhasa006 ▴ 70

I am trying to write reads into a SAM file format. First I'll extract the read from BAM file, update the query name and then write it into a SAM file. I need to perform this on each read of the BAM file. How can I write it back into the SAM file? My assumption is the SAM file is just a tab-delimited file, so first I'll write a tab-delimited file and then append the header and finally convert the sam file into the BAM file. I'm using the pysam and doing the following

for read in READS:
        query_name = read.query_name + 'added_info'
        contig = read.reference_name
        query_sequence = read.query_sequence
        flag = read.flag
        reference_id = read.reference_id
        reference_start = read.reference_start
        mapping_quality = read.mapping_quality
        cigar = read.cigarstring
        next_reference_id = read.next_reference_id
        next_reference_start = read.next_reference_start
        template_length = read.template_length
        query_qualities = read.query_qualities
        query_quality_string = ''
        query_quality_string = query_quality_string.join([chr(ch + 33) for ch in query_qualities])
        tags = read.tags

        outf.write(query_name + '\t' + flag + '\t' + contig + '\t' + reference_start + '\t' +
        mapping_quality + '\t' + cigar + '\t' + next_reference_id + '\t' + next_reference_start + '\t' + 
        str(template_length) + '\t' + query_sequence + '\t' + query_quality_string + '\t' + tags)


  1. Is the order in the SAM file correct?
  2. I have converted the query_qualities back into quality string by converting it into ASCII. Is that correct?
  3. How can I convert the tags into proper tags in the SAM file format. Currently, it is in this format [('NM', 12), ('MD', '81T74'), ('AS', 134), ('XS', 20)] is it just some regular string processing?
bam python sam pysam alignment • 1.4k views
Entering edit mode
10 months ago

I think you are making things more complicated than necessary. You can do:

samfile = pysam.AlignmentFile("in.bam", "rb")
outfile = pysam.AlignmentFile("out.bam", "wb", template=samfile)

for read in samfile:
    read.query_name = read.query_name + '_add_info'

this will write in bam format but you can change wb to w for writing in sam. Also note that you can use read.to_string() for a string representation of the read that you can then parse using e.g. read.to_string().split('\t'), but if you use pysam you shouldn't need to do that.


Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6