Question: Problems For Mapping Contigs Onto Reference
gravatar for Plantae
7.9 years ago by
Plantae380 wrote:

We have assembled several contigs from SOLEXA reads, now we try to mapped these contigs onto a related reference genome. we used blat to do this work, the problem is that blat hits distributed broadly along the chromosomes, although blat can link these hits into larger blocks when we used PSL output format, but the link made by blat were poor. for example, the contig is only 10kb in length, but when mapped to reference, then link by blat default settings, it span for more than 500kb.

beside blat, are there better mapping tools?

we also tried mummer, but the problem for mummer is that it did not link segments of hits, thus the result is hard to be used.

genome reference blat • 2.8k views
ADD COMMENTlink modified 10 months ago by deepti1rao20 • written 7.9 years ago by Plantae380
gravatar for Michael Dondrup
7.9 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

You could try LASTZ.

I didn't understand exactly what you mean by 'link these hits into larger blocks'. But did you specify the -maxIntronSize parameter? Try setting it to 0. However, if you can align your contigs only with these large gaps, you are either doing assembly of RNA-seq and these gaps are real or you might see mis-assemblies. Or maybe your reference sequence is more distant than you thought?


In the light of what you supplied in information, I would simply use blast. Blat was made primarily for highly similar sequences with large insert, e.g. mapping ESTs to the genome, maybe also useful for aligning 454 reads to the genome, but it is not the tool of choice for sequences that diverted 5 million years ago.

Further, there is nothing that suggests that you or blat 'linked' the local alignments in your example, the only link is that these alignments stem from the same query sequence. To assume that there is an alignment spanning the the whole range is not valid, given the parameters of sequence identity and gaps, exactly because of the sequence differences between them. What this also tells you is that there was one large insert. You could also experiment with gap costs, different substitution costs a bit and also try to set the -maxIntronSize parameter in blat higher, but possibly that wouldn't work either.

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Michael Dondrup45k
gravatar for mgalactus
6.7 years ago by
United Kingdom
mgalactus720 wrote:

You could try CONTIGuator, which will use blastn to map the contigs to a reference genome (which can contain multiple replicons) and will prepare a set of maps that are viewable with the ACT tool from the sanger institute (and in the upcoming release it will also produce pdf maps)

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by mgalactus720
gravatar for Plantae
7.9 years ago by
Plantae380 wrote:

yes, our reference diverge to the sequenced genome about 5 million years ago. Thus large gaps are normal. we would like to group these hits on the reference, so a link process is necessary. eg: the blat output using blast8:
query hit qstart qend hstart hend
contig1 chr1 1 4073 29669 33741
contig1 chr1 4086 5715 33764 35401
contig1 chr1 5764 11061 35451 40748
contig1 chr1 11073 12115 1213456 1214588

the hit at the forth line is quite far away from the other three hits, but blat group these four hits togethor, thus, from psl output, the contig seems to occupy 29669 - 1214588, but the contig is 12kb in length. I know these might caused by genome rearrangements, like translocation, inversion etc, but the grouping method from blat seems to groups all consistent order alignment blocks together, without considering their distances.

ADD COMMENTlink written 7.9 years ago by Plantae380

This should be either a comment or an edit of your question. And btw. why don't you use blast? It's sort of obvious to me to use blast here.

ADD REPLYlink written 7.9 years ago by Michael Dondrup45k

blast may have the reverse problem of splitting single hits into dozens of hsps...

ADD REPLYlink written 7.4 years ago by Yannick Wurm2.3k
gravatar for deepti1rao
10 months ago by
deepti1rao20 wrote:

I have a similar issue and I would like to have all the four alignments that you mentioned in one big block of alignment, wherein gaps could be shown.

How can I do that??

I do not want to use blast, as I want the genomic coordinates of my hits.

ADD COMMENTlink written 10 months ago by deepti1rao20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1733 users visited in the last hour