Question: Efficiently run blat for long sequences
gravatar for Jautis
10 months ago by
United States
Jautis290 wrote:

Hello, I'm using blat to align genes from one genome to another. This is working well for small sequences (<10kb), but longer sequences are running for an more than a day with no signs of finishing. This seems to be especially true for those 35kb+ and some of the sequences are near 200kb.

Does anybody have suggestions for increasing the efficiency? I've thought about blat-ing 10kb intervals of the genes, but that would pose problems if some intervals fail to map or fail to map uniquely. I've pasted below the code that I'm currently using to run blat given the target genome and sequence, requiring at least a 90% of the sequences match and 97% identity. Thanks!

 f=`awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' sequence.fa | tail -1`
 a=$(( 9*f/10 ))
 blat target.2bit sequence.fa psl/sequence.psl -tileSize=15 -minScore=$a -minIdentity=97
ADD COMMENTlink modified 9 months ago by Vitis2.4k • written 10 months ago by Jautis290
gravatar for Vitis
9 months ago by
New York
Vitis2.4k wrote:

Are you aligning spliced transcripts to genome assemblies, which requires opening big gaps (for introns)? If not, I'd suggest you to try minimap aligner: There is an option to deal with substantially diverged sequences. If you're mapping spliced transcripts to genome assemblies (which are typically quite small), I don't think BLAT would have a problem.

ADD COMMENTlink written 9 months ago by Vitis2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1368 users visited in the last hour