I am trying to write reads into a SAM file format. First I'll extract the read from BAM file, update the query name and then write it into a SAM file. I need to perform this on each read of the BAM file. How can I write it back into the SAM file? My assumption is the SAM file is just a tab-delimited file, so first I'll write a tab-delimited file and then append the header and finally convert the sam file into the BAM file. I'm using the pysam and doing the following
for read in READS: query_name = read.query_name + 'added_info' contig = read.reference_name query_sequence = read.query_sequence flag = read.flag reference_id = read.reference_id reference_start = read.reference_start mapping_quality = read.mapping_quality cigar = read.cigarstring next_reference_id = read.next_reference_id next_reference_start = read.next_reference_start template_length = read.template_length query_qualities = read.query_qualities query_quality_string = '' query_quality_string = query_quality_string.join([chr(ch + 33) for ch in query_qualities]) tags = read.tags outf.write(query_name + '\t' + flag + '\t' + contig + '\t' + reference_start + '\t' + mapping_quality + '\t' + cigar + '\t' + next_reference_id + '\t' + next_reference_start + '\t' + str(template_length) + '\t' + query_sequence + '\t' + query_quality_string + '\t' + tags)
- Is the order in the SAM file correct?
- I have converted the query_qualities back into quality string by converting it into ASCII. Is that correct?
- How can I convert the
tagsinto proper tags in the SAM file format. Currently, it is in this format
[('NM', 12), ('MD', '81T74'), ('AS', 134), ('XS', 20)]is it just some regular string processing?