Write SAM file manually with pysam
1
1
Entering edit mode
21 months ago
mhasa006 ▴ 70

I am trying to write reads into a SAM file format. First I'll extract the read from BAM file, update the query name and then write it into a SAM file. I need to perform this on each read of the BAM file. How can I write it back into the SAM file? My assumption is the SAM file is just a tab-delimited file, so first I'll write a tab-delimited file and then append the header and finally convert the sam file into the BAM file. I'm using the pysam and doing the following

for read in READS:
        query_name = read.query_name + 'added_info'
        contig = read.reference_name
        query_sequence = read.query_sequence
        flag = read.flag
        reference_id = read.reference_id
        reference_start = read.reference_start
        mapping_quality = read.mapping_quality
        cigar = read.cigarstring
        next_reference_id = read.next_reference_id
        next_reference_start = read.next_reference_start
        template_length = read.template_length
        query_qualities = read.query_qualities
        query_quality_string = ''
        query_quality_string = query_quality_string.join([chr(ch + 33) for ch in query_qualities])
        tags = read.tags

        outf.write(query_name + '\t' + flag + '\t' + contig + '\t' + reference_start + '\t' +
        mapping_quality + '\t' + cigar + '\t' + next_reference_id + '\t' + next_reference_start + '\t' + 
        str(template_length) + '\t' + query_sequence + '\t' + query_quality_string + '\t' + tags)

Issues:

  1. Is the order in the SAM file correct?
  2. I have converted the query_qualities back into quality string by converting it into ASCII. Is that correct?
  3. How can I convert the tags into proper tags in the SAM file format. Currently, it is in this format [('NM', 12), ('MD', '81T74'), ('AS', 134), ('XS', 20)] is it just some regular string processing?
bam python sam pysam alignment • 2.7k views
ADD COMMENT
2
Entering edit mode
21 months ago

I think you are making things more complicated than necessary. You can do:

samfile = pysam.AlignmentFile("in.bam", "rb")
outfile = pysam.AlignmentFile("out.bam", "wb", template=samfile)

for read in samfile:
    read.query_name = read.query_name + '_add_info'
    outfile.write(read)

this will write in bam format but you can change wb to w for writing in sam. Also note that you can use read.to_string() for a string representation of the read that you can then parse using e.g. read.to_string().split('\t'), but if you use pysam you shouldn't need to do that.

ADD COMMENT

Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6