Question

Trouble getting BLAT to work

0

Entering edit mode

2.8 years ago

azhai ▴ 10

I've been trying to use BLAT to align some sequences to the axolotl genome. I have all the required files and have finally got BLAT to actually run (I'm running it on my own mac in terminal), and this is the error that I keep encountering:

Assertion failed: ((hit->tStart >> bucketShift) < bucketCount), function clumpHits, file genoFind.c, line 1806.

Anyone have any ideas of what could be the issue? Thanks for any help

Blat • 2.4k views

ADD COMMENT • link updated 2.8 years ago by GenoMax 141k • written 2.8 years ago by azhai ▴ 10

0

Entering edit mode

What parameters or options are you passing to blat? Perhaps provide the full command you are running.

ADD REPLY • link 2.8 years ago by Alex Reynolds 35k

0

Entering edit mode

I'm running it at a very basic level. I should say I don't have a background in CS, so that might be why I'm having issues. Here's the command:

./blat Genome.fa Input.fa output.psl Genome.fa is just the full axolotl genome, and Input.fa is just the sequences I'm trying to align.

ADD REPLY • link 2.8 years ago by azhai ▴ 10

0

Entering edit mode

What kind of sequences are in Input.fa? blat is meant to be used with sequences that are similar. It is not a tool meant to find distantly related sequences.

ADD REPLY • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

They are a bunch of sequences varying in length from 200-1000 bp, so could the differences in length be the issue?

ADD REPLY • link 2.8 years ago by azhai ▴ 10

0

Entering edit mode

No that should not be an issue. Are your input sequences in multi-fasta format?

ADD REPLY • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

I'm not sure what that means, it's a TRINITY transcriptome assembly of genes expressed during certain stages of axolotl development.

Edit: should've Googled multi fasta before posting, my bad. Yes, it basically is in multi-fasta format. Here's the type of thing each sequence in preceded by:

>TRINITY_DN20780_c0_g1_i1 len=639 path=[1367:0-311 1382:312-312 1381:313-313 1380:314-337 1379:338-462 1362:463-638] [-1, 1367, 1382, 1381, 1380, 1379, 1362, -2]

ADD REPLY • link 2.8 years ago by azhai ▴ 10

0

Entering edit mode

Can you take 3-4 sequences from your file put them in a new file and see if you still get that error using this new file? It is possible that all that cruft (extra stuff) in fasta header may be causing a problem. If the search works then there may be a specific sequence in your data that is causing that error. If it does not then you may need to shorten those headers. But one step at a time.

ADD REPLY • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

I tried it with only one sequences and still got the same error. I then shortened the header so that it only contained the carrot >, still the same error. For the heck of it, I decided to remove the header entirely, and that gave the following error:

Couldn't open TCAGCTTTGTTCTCCGCTGCCATTTTCTCATATTGTGTACGGATTTCCGCGATCGCTACA , No such file or directory

That is the first line of the sequence in Input.fa, not even the entire sequence. This is really perplexing me.

ADD REPLY • link 2.8 years ago by azhai ▴ 10

0

Entering edit mode

Just ran a test

$ blat db_dna.fa db_dna.fa outpsl
Loaded 9939 letters in 5 sequences
Searched 9939 bases in 5 sequences

$ more outpsl 
psLayout version 3

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q       Q       T               T       T       T       block   blockS
izes    qStarts  tStarts
        match   match           count   bases   count   bases           name            size    start   end     name            size    start   end     count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
3642    0       0       0       0       0       0       0       +       test1   3642    0       3642    test1   3642    0       3642    1       3642,   0,      0,
2507    0       0       0       0       0       0       0       +       test2   2507    0       2507    test2   2507    0       2507    1       2507,   0,      0,
967     0       0       0       0       0       0       0       +       test3   967     0       967     test3   967     0       967     1       967,    0,      0,
2200    0       0       0       0       0       0       0       +       test4   2200    0       2200    test4   2200    0       2200    1       2200,   0,      0,
623     0       0       0       0       0       0       0       +       test5   623     0       623     test5   623     0       623     1       623,    0,      0,

It should be just that simple.

Can you try running your test file with one sequence (with header) against itself? That would make sure your blat works properly.

Long headers are not a problem either.

$ more test.fa
>TRINITY_DN20780_c0_g1_i1 len=639 path=[1367:0-311 1382:312-312 1381:313-313 1380:314-337 1379:338-462 1362:463-638] [-1, 1367, 1382, 1381, 1380, 1379, 1362, -2]
GCCGTGTCCGAGCGCTGGCGGAGCACGGGCCGCCGCGGCGCGCGCCTACCCCCGCGCACACGTGCATGCA
GTGAGCCAGGTGCATCAGCTGCACCCGCCCTCGGGCTTCGCGCGCCTCCGCTGGAGCCCAGCTAGCCTCG
CGGCTCCGGGGCCGACCGGGTCGGAACTGCTGCGTCCCCATTCCGGACGCAGAGAGCGCGCGGGGCGCCG
GCGCCGCCGGTTAAAGTGCCGCGGGGCAGGGGCCGGGCCGCGGCCACCCGCTCCTCCCGCTCCGGTCCCG

$ blat test.fa db_dna.fa outpsl
Loaded 840 letters in 1 sequences
Searched 9939 bases in 5 sequences

$ more outpsl 
psLayout version 3

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q       Q       T               T       T       T       block   blockS
izes    qStarts  tStarts
        match   match           count   bases   count   bases           name            size    start   end     name            size    start   end     count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
840     0       0       0       0       0       0       0       +       test1   3642    0       840     TRINITY_DN20780_c0_g1_i1        840     0       840     1    840,     0,      0,

ADD REPLY • link 2.8 years ago by GenoMax 141k

1

Entering edit mode

Yup that worked, so that leads me to believe that somethings wrong with the axolotl genome I'm using. For one thing, it is massive, I can't even open the fasta file that it's in because it's 28 gb, so I'm not sure what it looks like. Maybe I'll try converting it to a 2bit file, not sure how that would change things, but it's worth a shot.

ADD REPLY • link 2.8 years ago by azhai ▴ 10

0

Entering edit mode

Worth a try. Let us know how that goes.

ADD REPLY • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

It did not work. Starting to feel like I'm running out of ideas, any further suggestions? Thanks for all the help so far! I really appreciate it.

ADD REPLY • link 2.8 years ago by azhai ▴ 10

score 1 · Answer 1 · 2021-07-08

1

Entering edit mode

2.8 years ago

GenoMax 141k

Use a proper NGS aligner. Start with minimap2. I would recommend bbmap as well. Make sure you have enough RAM available to create genome indexes.

ADD COMMENT • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

minimap2 appears to have worked! Thank you so much for sticking with me, I appreciated the help!

ADD REPLY • link 2.8 years ago by azhai ▴ 10