Question: Conversion Of Blat Output To Sam/Bam
gravatar for Michael Dondrup
9.9 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

I have some short reads aligned using BLAT, the output is in tabular psl format (including the sequences) for each alignment. Is it possible to convert the blat output to SAM/BAM format. Myself, I would think it is not because of the lack of some data fields in psl which is required for SAM format (mainly the CIGAR string), but please proove me wrong! Normally I would advise myself to use a different tool (bwa, bowtie, lastz) and align getting a SAM file, but what if that wasn't an option (say because you really want to use BLAT or you don't have the input) is there a way to do the conversion. I can possibly code that in perl and share it if someone had an idea how to do it.

conversion format sam bam blat • 10k views
ADD COMMENTlink modified 3.9 years ago by Botond Sipos1.7k • written 9.9 years ago by Michael Dondrup48k
gravatar for Pierre Lindenbaum
9.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

(not tested): there is a script named in:

ADD COMMENTlink modified 15 months ago by _r_am31k • written 9.9 years ago by Pierre Lindenbaum131k

Yeah, thanks a lot. Tested, that seems to work. Even without the sequences included. I wonder what more nice things are there hidden in samtools. btw, the script has some options (-a,-b, -q, -r) but they are seemingly ignored.

ADD REPLYlink written 9.9 years ago by Michael Dondrup48k

This script would not add the header to the sam file. Here is how to add header to your sam file.

ADD REPLYlink written 4.8 years ago by Prakki Rama2.4k

If you want some additions on top of that (for example to view in IGV) I just did the following oneliner:

blat $REF $QUERY $RESULT.pslx -out=pslx -noTrimA -extendThroughN && $RESULT.pslx | python $QUERY | samtools view -bST $REF - > $RESULT.bam && samtools sort $RESULT.bam $RESULT.sorted && samtools index $RESULT.sorted.bam

where this script is

import sys
from string import maketrans
transtab = maketrans('ACTG','TGAC')

name2seq = {}
for line in fa:
    seq =

for line in sam:
    vals = line.split("\t")
    assert vals[0] in name2seq
    if vals[1]=='0':
        assert vals[1]=='16'
    vals[5] = 'S'.join(vals[5].split('H'))
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by joshkorn0
gravatar for jeltje.van.baren
4.2 years ago by
jeltje.van.baren80 wrote:

This solution no longer works very well. I verified this by uploading the input psl and the output sam (converted to bam) to the UCSC genome browser.

Since this page comes up as the first hit on Google, here's a better solution using bedtools, which you can apt-get on ubuntu


(or look around on that site for your operating system)

Get a list of chromosome sizes, for instance


Now run

pslToBed input.psl input.bed
bedtools bedtobam -bed12 -i input.bed -g hg38.chrom.sizes > output.bam
samtools view output.bam > output.sam

Using this, my input.psl matches my output.bam file when uploaded as a track.

ADD COMMENTlink modified 15 months ago by _r_am31k • written 4.2 years ago by jeltje.van.baren80

This works well, but is there a way to include the sequences in the generated SAM or one has to do that manually?

ADD REPLYlink written 3.9 years ago by Botond Sipos1.7k

Did check the samtools approach? I think it could have included the sequences.

ADD REPLYlink written 3.9 years ago by Michael Dondrup48k

It does not work unfortunately, the sequences are not included. Also the SAM output is invalid as there are often dashes in the CIGAR string like this: 2H-574M609H

ADD REPLYlink written 3.9 years ago by Botond Sipos1.7k

I have tried a different approach: I have converted the PSL to BAM using the bamtools method above and cooked up a script to augment the BAM file with the sequences from the original fasta file. However the CIGAR string is inconsistent with the sequence length, for example:

Read  0     
REF   1953    255     15M6N30M        *       0       0      

I guess this means that something went wrong with the PSL -> BED -> SAM conversion. Note that the CIGAR string has no hard clip operations in it. Does anybody have any clue about this?

ADD REPLYlink modified 15 months ago by _r_am31k • written 3.9 years ago by Botond Sipos1.7k

I am also interested in this, how did you populate your sam with the sequences?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by tiago2112871.2k
gravatar for Botond Sipos
3.9 years ago by
Botond Sipos1.7k
United Kingdom
Botond Sipos1.7k wrote:

I wrote a conversion tool in Python: Uncle PSL: BLAT to SAM conversion in Python

ADD COMMENTlink written 3.9 years ago by Botond Sipos1.7k

Hi! We are trying to use your tool but we have problems installing it even. Could you provide help?

ADD REPLYlink written 3.6 years ago by bastianfromm0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2123 users visited in the last hour