I'm running BLAT with this command:
blat genome.fasta query.fasta blat_90.csv -minIdentity=90 -noHead
And I'm surprised that I've counted the results and order by count.
import pandas as pd import sys df_blat = pd.read_csv('blat_90.csv', index_col=False, sep='\t') s = df_blat.qName.value_counts() count = 1 for k in s.head(100).index.tolist(): count += 1 print(s[k],k)
The interesting thing is that the max results per query is 672 and this number is repeated about 30 times. Then I have queries that were repeated 671 times again many times.
672 MITE_T_71978|chr3D|340455280|340455443|AT|175|F2975 672 MITE_T_97305|chr7B|194283260|194283371|AT|29|F4298 672 MITE_T_110543|chr5D|34445518|34445608|TA|94|F5023 672 MITE_T_72475|chr3D|503092467|503092630|AT|174|F2988 ...
So this makes me wonder if there's a limit in the number of times a query is search. I've repeated this script with no different results.
UPDATE: I've notice that running only one query with each wheat chromosome in a separate thread, the max results per search is 32.
This is how the results look like when I search only one sequence agains each chromosome at the time:
wc -l b* 32 b1A.csv 32 b1B.csv 32 b1D.csv 32 b2A.csv 32 b2B.csv ...
Max 32 hits per search (accordingly 32 x 21 chromosomes is 672)
There must be a limit in the code but I'm still not able to find it