Question: Parsing Blat Output Is Driving Me Crazy. Please Help Before I Break My Keyboard.
0
gravatar for JayB
4.8 years ago by
JayB30
JayB30 wrote:

I am working on a project using command line BLAT right now. I need to be able to take the output of the BLAT run, in any of the supported formats, and convert into a format that can be re-entered into a BLAT run. Eventually, my goal is to be able to iterate my BLAT runs. For reference BLAT can output psl, pslx, maf, sim4, axt, blast- tab, and blast-text format but takes as input only fasta, nib, and 2bit. I found a Biopython module called BlatIO (BlatIO on github.com) that supports parsing for .psl or .pslx files and attempted to parse this .psl output into a fasta format using my own code:

import sys
sys.path.insert(1, 'C:\\Python27\Lib\site-packages\Bio\BlatIO.py')
from Bio.AlignIO import BlatIO
from Bio import SearchIO
from Bio.SearchIO._model import QueryResult, Hit, HSP, HSPFragment

alignments = SearchIO.parse(input_file, 'blat-psl', pslx=True)
line1= QueryResult.id
line2= HSPFragment.query
print ('>', line1)
print (line2)

The output is not an ID and a sequence like I would expect though. Instead I get this:

('>', property object at 0x029BC9F0) property object at 0x029BC3C0

I am open to all suggestions about how to get ANY of the BLAT output formats into ANY of the BLAT input formats....either through fixing the code I have started above or some other method.

THANK YOU!

(PS- I have already done this project in BLAST so please don't tell me to just use BLAST. I know that BLAST has different and in some ways better output formatting options, but I really need to use BLAT not BLAST. PPS - I am aware of tools like those as usaglaxay.com that convert files however I really need a code or package to do this, preferably in Python or Perl, and not a web browser tool!)

fasta python blat • 3.4k views
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by JayB30
1

Your best friends for sorting out things like this are:

type(foo)  # tells you what foo is
dir(foo)    # tells you what type of attributes and methods does the foo instance have

in your case when you print the object it gives you the string representation of that object, which is not all that helpful (ok it is atrocious)

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Istvan Albert ♦♦ 78k

I don't know biopython but think it has to do with that line1 is sort of a 'class reference' not a result object instance, it seems intuitive that you need to loop over all alignments; at least in bioperl you need to do this. And then extract data via accessor methods. So it doesn't look like your program could work at all (note I know nothing about python,so maybe there is some kind of weird magic).

I'd look for a class that writes fasta files (smth like SeqIO (Bio::SeqIO in bioperl)), pass it the alignment object and see what happens.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Michael Dondrup45k

Btw, if I see correctly BlatIO inherits from SearchIO and the object returned by SearchIO.parse should have the same interface as any object returned by SearchIO.parse, so you just have to look for example code for class/interface SearchIO and it should work. That, given the factory pattern of SearchIO.parse works as I assume.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Michael Dondrup45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1564 users visited in the last hour