We have assembled several contigs from SOLEXA reads, now we try to mapped these contigs onto a related reference genome.
we used blat to do this work, the problem is that blat hits distributed broadly along the chromosomes, although blat can link these hits into larger blocks when we used PSL output format, but the link made by blat were poor.
for example, the contig is only 10kb in length, but when mapped to reference, then link by blat default settings, it span for more than 500kb.
beside blat, are there better mapping tools?
we also tried mummer, but the problem for mummer is that it did not link segments of hits, thus the result is hard to be used.
I didn't understand exactly what you mean by 'link these hits into larger blocks'. But did you specify the -maxIntronSize parameter? Try setting it to 0. However, if you can align your contigs only with these large gaps, you are either doing assembly of RNA-seq and these gaps are real or you might see mis-assemblies. Or maybe your reference sequence is more distant than you thought?
Edit:
In the light of what you supplied in information, I would simply use blast. Blat was made primarily for highly similar sequences with large insert, e.g. mapping ESTs to the genome, maybe also useful for aligning 454 reads to the genome, but it is not the tool of choice for sequences that diverted 5 million years ago.
Further, there is nothing that suggests that you or blat 'linked' the local alignments in your example, the only link is that these alignments stem from the same query sequence. To assume that there is an alignment spanning the the whole range is not valid, given the parameters of sequence identity and gaps, exactly because of the sequence differences between them. What this also tells you is that there was one large insert.
You could also experiment with gap costs, different substitution costs a bit and also try to set the -maxIntronSize parameter in blat higher, but possibly that wouldn't work either.
You could try CONTIGuator, which will use blastn to map the contigs to a reference genome (which can contain multiple replicons) and will prepare a set of maps that are viewable with the ACT tool from the sanger institute (and in the upcoming release it will also produce pdf maps)
yes, our reference diverge to the sequenced genome about 5 million years ago.
Thus large gaps are normal. we would like to group these hits on the reference, so a link process is necessary.
eg:
the blat output using blast8:
query hit qstart qend hstart hend
contig1 chr1 1 4073 29669 33741
contig1 chr1 4086 5715 33764 35401
contig1 chr1 5764 11061 35451 40748
contig1 chr1 11073 12115 1213456 1214588
the hit at the forth line is quite far away from the other three hits, but blat group these four hits togethor, thus, from psl output,
the contig seems to occupy 29669 - 1214588, but the contig is 12kb in length.
I know these might caused by genome rearrangements, like translocation, inversion etc, but the grouping method from blat seems to groups all consistent order alignment blocks together, without considering their distances.
This should be either a comment or an edit of your question. And btw. why don't you use blast? It's sort of obvious to me to use blast here.
blast may have the reverse problem of splitting single hits into dozens of hsps...