Continuous alignment of contigs with reference using blast /blat
1
0
Entering edit mode
3.5 years ago
deepti1rao ▴ 40

Why don't my de novo assembled contigs not have continuous ensembl blat alignments with a closely related reference genome? For example, I have a certain stretch of my contig with 100% identity, and then a new alignment begins after 10 bases (of the subject), and I get a score of say 99% and then another one with 97% identity. These hits point to the same locus of the reference. How can I have the entire stretch represented in one alignment, with gaps inserted if required, even if the identity goes down? That would help me in comparing the entire contig at once with the reference.

Blat Contigs Alignment • 1.1k views
1
Entering edit mode

You are going to have to play with alignment parameters to get that exact need (and it may still not work as you want it to).

Perhaps using a program like lastz that was designed for chromosome length entities may work better since blast is always going to try local alignments.

0
Entering edit mode

If the two genomes are so similar, you may have more luck using BWA or minimap2.

0
Entering edit mode

Doesn't BWA clip off unmatched ends and do a sort of local alignment??

0
Entering edit mode

Indeed it would probably soft-clip ends, but I guess it would do a reasonably good job of representing the internal alignment as one continuous stretch - provided there aren't any invertions or translocations. However, I think you are wrong, forcing alignment when it is unlikely there is one will probably hinder the comparison between your contigs and the reference, not facilitate it.

What are the sizes of contigs and reference genome? How many contigs on your assembly and chromosomes on the reference genome?

0
Entering edit mode

I do not intend to do a forced alignment. Also, I do not want to lose information by having my contigs soft clipped. I want to compare the contigs as a whole with the reference. That way, I may be able to annotate the draft genome and locate missassemblies/ structural variants that can be studied further by experimenting with gene constructs (carrying variants) in the living system. I have 45k+ contigs, the max and min length being 185,793 and 500 bp and the reference has 14 chromosomes.

1
Entering edit mode
3.5 years ago
h.mon 33k

QUAST is a tool designed to do what you want. It does so using local alignment, though - which, in my view, is the appropriate approach to the problem.

If you are so adamant on using global alignment, you may try your luck with LAST with the -T 1 parameter:

-T NUMBER

Type of alignment: 0 means "local alignment" and 1 means "overlap alignment". Local alignments can end anywhere in the middle or at the ends of the sequences. Overlap alignments must extend to the left until they hit the end of a sequence (either query or reference), and to the right until they hit the end of a sequence.

Warning: it's often a bad idea to use -T1. This setting does not change the maximum score drop allowed inside alignments, so if an alignment cannot be extended to the end of a sequence without exceeding this drop, it will be discarded.