Question: Conversion Of Blat Output To Sam/Bam
8
gravatar for Michael Dondrup
8.3 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

I have some short reads aligned using BLAT, the output is in tabular psl format (including the sequences) for each alignment. Is it possible to convert the blat output to SAM/BAM format. Myself, I would think it is not because of the lack of some data fields in psl which is required for SAM format (mainly the CIGAR string), but please proove me wrong! Normally I would advise myself to use a different tool (bwa, bowtie, lastz) and align getting a SAM file, but what if that wasn't an option (say because you really want to use BLAT or you don't have the input) is there a way to do the conversion. I can possibly code that in perl and share it if someone had an idea how to do it.

conversion format sam bam blat • 8.8k views
ADD COMMENTlink modified 2.2 years ago by Botond Sipos1.7k • written 8.3 years ago by Michael Dondrup46k
10
gravatar for Pierre Lindenbaum
8.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

(not tested): there is a script named psl2sam.pl in :

samtools/misc/psl2sam.pl
ADD COMMENTlink written 8.3 years ago by Pierre Lindenbaum119k

Yeah, thanks a lot. Tested, that seems to work. Even without the sequences included. I wonder what more nice things are there hidden in samtools. btw, the script has some options (-a,-b, -q, -r) but they are seemingly ignored.

ADD REPLYlink written 8.3 years ago by Michael Dondrup46k

This script would not add the header to the sam file. Here is how to add header to your sam file.

ADD REPLYlink written 3.2 years ago by Prakki Rama2.2k

If you want some additions on top of that (for example to view in IGV) I just did the following oneliner:

REF=reffasta.fa
QUERY=queryfasta.fa
RESULT=outputbase
blat $REF $QUERY $RESULT.pslx -out=pslx -noTrimA -extendThroughN && psl2sam.pl $RESULT.pslx | python add_seq_info_to_psl2sam.py $QUERY | samtools view -bST $REF - > $RESULT.bam && samtools sort $RESULT.bam $RESULT.sorted && samtools index $RESULT.sorted.bam

where this script is add_seq_info_to_psl2sam.py:

import sys
from string import maketrans
sam=sys.stdin
transtab = maketrans('ACTG','TGAC')

name2seq = {}
fa=open(sys.argv[1])
for line in fa:
    name=line.strip()[1:]
    seq = fa.next().strip()
    name2seq[name]=seq
fa.close()

for line in sam:
    vals = line.split("\t")
    assert vals[0] in name2seq
    if vals[1]=='0':
        vals[-3]=name2seq[vals[0]]
    else:
        assert vals[1]=='16'
        vals[-3]=name2seq[vals[0]][::-1].translate(transtab)
    vals[5] = 'S'.join(vals[5].split('H'))
    sys.stdout.write("\t".join(vals))
ADD REPLYlink modified 16 months ago • written 16 months ago by joshkorn0
5
gravatar for jeltje.van.baren
2.6 years ago by
California
jeltje.van.baren80 wrote:

This psl2sam.pl solution no longer works very well. I verified this by uploading the input psl and the output sam (converted to bam) to the UCSC genome browser.

Since this page comes up as the first hit on Google, here's a better solution using bedtools, which you can apt-get on ubuntu

wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/pslToBed

(or look around on that site for your operating system)

Get a list of chromosome sizes, for instance

wget https://genome.ucsc.edu/goldenpath/help/hg38.chrom.sizes

Now run

pslToBed input.psl input.bed

bedtools bedtobam -bed12 -i input.bed -g hg38.chrom.sizes > output.bam

samtools view output.bam > output.sam

Using this, my input.psl matches my output.bam file when uploaded as a track.

ADD COMMENTlink written 2.6 years ago by jeltje.van.baren80

This works well, but is there a way to include the sequences in the generated SAM or one has to do that manually?

ADD REPLYlink written 2.3 years ago by Botond Sipos1.7k
1

Did check the samtools approach? I think it could have included the sequences.

ADD REPLYlink written 2.3 years ago by Michael Dondrup46k

It does not work unfortunately, the sequences are not included. Also the SAM output is invalid as there are often dashes in the CIGAR string like this: 2H-574M609H

ADD REPLYlink written 2.3 years ago by Botond Sipos1.7k

I have tried a different approach: I have converted the PSL to BAM using the bamtools method above and cooked up a script to augment the BAM file with the sequences from the original fasta file. However the CIGAR string is inconsistent with the sequence length, for example:

Read 0
REF 1953 255 15M6N30M * 0 0
ACGAGGGGGCGACCGGTGTTCAGCCCCCGTCGACTTGAGTCCGTTTGCAATGCGTCTATCGTTCACGGTCAAGATGTTCCACCTTGCTTCACTTGTGT CCTTTCGCGAAGGCGCCAAGTTCGGTGCCTAGCAGTGCATAACGTCGCCTATCGCTCTGGATTGTCTGGCCACAAGTATAGCCTGCTGTGGTGTTTATAG TTCGTAGTCCAATCTGTATCGTGCCGCTAAGTATAGCATGTCGCTGCCTGATGCAATAGTGCTTCGCACTCATTGGTAGGCGGCAACTAGTTGCCATAAGTT GCATGGTGGTGAGGTGGATACGATAAGTTGGGGACGGTGACTGACACTGTCGTCTAGTTGGGAGCAACTTATTGCCCGGAGTATAGATAAGACGTCGTG ATGCGCATGAGATCGCGGACTGTACTGTTCACTTTTGTAGTTTTCGACTGCTTGCGCCCAAAGGCTACTTATGTAGTATGGTATAGCATGTCTGCGATGCTTT GTTTGCTCCTGAGTGTGTTGACGATGGATGAAGTATTGATTGTTCAACGGGGAAGACAGTGGTGCCGCAATGAAAAGTTGAGGTGCAAATAGAGCGGGC CGGCATTCGTTTTCAAACGCCTTTTATCCGGTGCAGATTGCGTACGGATACACTAAATTTGTTTTTACTTTCGTACTAGCGTTACGGTGATAAACGATTTCTTT TCATGACACGAAACATAATAGTCTTGCCGTCCGATTGCCGATATTGTCTGTAATGGAGCAGCGATGATCAAGAGCAATCAACTTCCGGCTATGCAGTGTGTT TGGTAATCGACATCGCGCGCCTTAATCGCTTTTCATCGCGCGGAACAGAAGAAGACCTATTTTTGTAGTCGCATAAGCGCCGTGGAATGTGATTGCTTAGTT GCATCTACAAGGTGACTAGGTGGAAAAATCACTCGTGTTTCGGATGTGGTGACGTCTCGCGGTTATAATTTGTCTGTCTTAGTGGATGTCATGTCCTGGTTT AGGTGGAATCAAGTTAACTTTTTGTTGCCATGTCAGGTGCAAACCAACGACAGTGATGACATTGCGTCTAGACTATTGGAGCTAGTGGTATCTAAAATGGGAT CTCTGTGTGTTCCGGTCTTAGGTTTTAAATATTGGTCGCCAGGTGCAATAAAAACCGGATCACGATGAACATTGCCCCGCGACTAGTTTAGTAGTCTTTAAGA TGACGCTAGTGATGGAGAAAAACGCCCCCTCCCCACCCCCATCTTACTCTACTACCGCCCTCCATTCATATTTTTCTAACCCTCTCCATACTTACTCATCCATCC GCCCCCTACACACCTTACGTCCCTTACAACCTCCCTTTTCGAAAAATAACTGA *

I guess this means that something went wrong with the PSL -> BED -> SAM conversion. Note that the CIGAR string has no hard clip operations in it. Does anybody have any clue about this?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Botond Sipos1.7k

I am also interested in this, how did you populate your sam with the sequences?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by tiago2112871.1k
1
gravatar for Botond Sipos
2.2 years ago by
Botond Sipos1.7k
United Kingdom
Botond Sipos1.7k wrote:

I wrote a conversion tool in Python: Uncle PSL: BLAT to SAM conversion in Python

ADD COMMENTlink written 2.2 years ago by Botond Sipos1.7k

Hi! We are trying to use your tool but we have problems installing it even. Could you provide help?

ADD REPLYlink written 2.0 years ago by bastianfromm0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1833 users visited in the last hour