Efficiently run blat for long sequences
Entering edit mode
4.1 years ago
Jautis ▴ 510

Hello, I'm using blat to align genes from one genome to another. This is working well for small sequences (<10kb), but longer sequences are running for an more than a day with no signs of finishing. This seems to be especially true for those 35kb+ and some of the sequences are near 200kb.

Does anybody have suggestions for increasing the efficiency? I've thought about blat-ing 10kb intervals of the genes, but that would pose problems if some intervals fail to map or fail to map uniquely. I've pasted below the code that I'm currently using to run blat given the target genome and sequence, requiring at least a 90% of the sequences match and 97% identity. Thanks!

 f=`awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' sequence.fa | tail -1`
 a=$(( 9*f/10 ))
 blat target.2bit sequence.fa psl/sequence.psl -tileSize=15 -minScore=$a -minIdentity=97
blat alignment sequence assembly • 842 views
Entering edit mode
4.0 years ago
Vitis ★ 2.5k

Are you aligning spliced transcripts to genome assemblies, which requires opening big gaps (for introns)? If not, I'd suggest you to try minimap aligner: https://github.com/lh3/minimap2. There is an option to deal with substantially diverged sequences. If you're mapping spliced transcripts to genome assemblies (which are typically quite small), I don't think BLAT would have a problem.


Login before adding your answer.

Traffic: 1058 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6