Question: Continuous alignment of contigs with reference using blast /blat
0
gravatar for deepti1rao
12 months ago by
deepti1rao20
deepti1rao20 wrote:

Why don't my de novo assembled contigs not have continuous ensembl blat alignments with a closely related reference genome? For example, I have a certain stretch of my contig with 100% identity, and then a new alignment begins after 10 bases (of the subject), and I get a score of say 99% and then another one with 97% identity. These hits point to the same locus of the reference. How can I have the entire stretch represented in one alignment, with gaps inserted if required, even if the identity goes down? That would help me in comparing the entire contig at once with the reference.

blat contigs alignment • 468 views
ADD COMMENTlink modified 12 months ago by h.mon25k • written 12 months ago by deepti1rao20
1

You are going to have to play with alignment parameters to get that exact need (and it may still not work as you want it to).

Perhaps using a program like lastz that was designed for chromosome length entities may work better since blast is always going to try local alignments.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax67k

If the two genomes are so similar, you may have more luck using BWA or minimap2.

ADD REPLYlink written 12 months ago by h.mon25k

Doesn't BWA clip off unmatched ends and do a sort of local alignment??

ADD REPLYlink written 12 months ago by deepti1rao20

Indeed it would probably soft-clip ends, but I guess it would do a reasonably good job of representing the internal alignment as one continuous stretch - provided there aren't any invertions or translocations. However, I think you are wrong, forcing alignment when it is unlikely there is one will probably hinder the comparison between your contigs and the reference, not facilitate it.

What are the sizes of contigs and reference genome? How many contigs on your assembly and chromosomes on the reference genome?

ADD REPLYlink written 12 months ago by h.mon25k

I do not intend to do a forced alignment. Also, I do not want to lose information by having my contigs soft clipped. I want to compare the contigs as a whole with the reference. That way, I may be able to annotate the draft genome and locate missassemblies/ structural variants that can be studied further by experimenting with gene constructs (carrying variants) in the living system. I have 45k+ contigs, the max and min length being 185,793 and 500 bp and the reference has 14 chromosomes.

ADD REPLYlink written 12 months ago by deepti1rao20
1
gravatar for h.mon
12 months ago by
h.mon25k
Brazil
h.mon25k wrote:

QUAST is a tool designed to do what you want. It does so using local alignment, though - which, in my view, is the appropriate approach to the problem.

If you are so adamant on using global alignment, you may try your luck with LAST with the -T 1 parameter:

-T NUMBER

Type of alignment: 0 means "local alignment" and 1 means "overlap alignment". Local alignments can end anywhere in the middle or at the ends of the sequences. Overlap alignments must extend to the left until they hit the end of a sequence (either query or reference), and to the right until they hit the end of a sequence.

Warning: it's often a bad idea to use -T1. This setting does not change the maximum score drop allowed inside alignments, so if an alignment cannot be extended to the end of a sequence without exceeding this drop, it will be discarded.

ADD COMMENTlink written 12 months ago by h.mon25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1435 users visited in the last hour