Question: Efficiently run blat for long sequences
0
gravatar for Jautis
16 days ago by
Jautis280
United States
Jautis280 wrote:

Hello, I'm using blat to align genes from one genome to another. This is working well for small sequences (<10kb), but longer sequences are running for an more than a day with no signs of finishing. This seems to be especially true for those 35kb+ and some of the sequences are near 200kb.

Does anybody have suggestions for increasing the efficiency? I've thought about blat-ing 10kb intervals of the genes, but that would pose problems if some intervals fail to map or fail to map uniquely. I've pasted below the code that I'm currently using to run blat given the target genome and sequence, requiring at least a 90% of the sequences match and 97% identity. Thanks!

 f=`awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' sequence.fa | tail -1`
 a=$(( 9*f/10 ))
 blat target.2bit sequence.fa psl/sequence.psl -tileSize=15 -minScore=$a -minIdentity=97
ADD COMMENTlink modified 5 days ago by Vitis2.2k • written 16 days ago by Jautis280
0
gravatar for Vitis
5 days ago by
Vitis2.2k
New York
Vitis2.2k wrote:

Are you aligning spliced transcripts to genome assemblies, which requires opening big gaps (for introns)? If not, I'd suggest you to try minimap aligner: https://github.com/lh3/minimap2. There is an option to deal with substantially diverged sequences. If you're mapping spliced transcripts to genome assemblies (which are typically quite small), I don't think BLAT would have a problem.

ADD COMMENTlink written 5 days ago by Vitis2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2037 users visited in the last hour