Mapping a small number of contigs to a reference subsequence
1
0
Entering edit mode
15 months ago

Hello,

Is there a method or a tool to align/map a small number of contigs (obtained with Canu) to a subsequence extracted from a reference genome ? For example,in a similar way to reads to reference assembly (using bwa mem, bowtie, minimap, etc), I want to align/assemble 6 contigs that span a region of interest to the reference sequence and to visualize the mapping (using Tablet, IGV, UGENE, etc.) with the overlap regions and gaps.

Or, is it possible to assemble "de novo" only a small number of contigs (that cover a specific region) into a bigger scaffold and align the sequence obtained to a reference subsequence ?

Thank you !

align genome mapping assembly contigs • 2.1k views
ADD COMMENT
2
Entering edit mode

Simply align using minimap2. Are the contigs larger or smaller than the sub-sequence. Depending on size you could do the search by swapping the two i.e. using reference as query.

ADD REPLY
4
Entering edit mode
15 months ago
cmdcolin ★ 3.8k

disclaimer: I'm a JBrowse 2 developer

I think @GenoMax 's advice is good, you can align the contigs vs the reference as is. Then, you could use JBrowse 2 (https://jbrowse.org/jb2) to visualize those alignments. Minimap2 for example will output a PAF file, and JBrowse 2 can visualize PAF files in a "synteny style" view

Here is a basic set of command line steps you can use for setting this up

npm install -g @jbrowse/cli
jbrowse create newdir # folder containing jbrowse 2 html,js,css etc

# samtools faidx creates fasta index
samtools faidx ref.fa
samtools faidx contigs.fa 

# add-assembly will copy your ref fast to the newdir
jbrowse add-assembly ref.fa --out newdir --load copy
jbrowse add-assembly contigs.fa --out newdir --load copy

# run alignment, the -c adds CIGAR strings creating base level alignment
minimap2 -c ref.fa contigs.fa > comparison.paf

# add the paf file as a "synteny track" in jbrowse, the assemblyNames is flipped in order that the minimap2 command uses
jbrowse add-track comparison.paf --assemblyNames contigs,ref --out newdir --load copy

# load gene annotations perhaps for the reference genome
jbrowse add-track ref.gff --out newdir --load copy --assembly ref
# may not have them but can load annotations predicted on the contigs also
jbrowse add-track contigs.gff --out newdir --load copy --assembly contigs

# start a simple http server in the newdir folder
cd newdir
npx serve
# open up http://localhost:3000 in your web browser

If you don't want a web server, you can also use "JBrowse Desktop" which is just a installer you can run (https://jbrowse.org/jb2/download/) but the CLI with the web server is easier to illustrate in a short number of steps

Then you can get dotplots or 'synteny style' views like this https://jbrowse.org/jb2/docs/user_guides/linear_synteny_view/

ADD COMMENT
0
Entering edit mode

Thank you for your responses. I found JBrowse very user frendly and useful.

I tried the set of command line but the output visualization is not really what I expected. I think minimap2 is not really able to align the contigs properly. In JBrowse, I used dotplots and the synteny style, but all I can view is one contig at a time vs Reference. It would be great if I could use the synteny style to see how the contigs are mapped to the ref but with all contigs ordered properly and showed in a single window, so I can see the contiguity and the gap regions. None of the contigs is larger than reference, but it is a highly repetitive genomic region. Does minimap2 has some defaults parameters that could affect this straight-forward alignment and that I should change ?

Can I generate a consensus sequence from the paf file ? And eventually compare the consensus to reference ?

Thanks again !

ADD REPLY
0
Entering edit mode

None of the contigs is larger than reference, but it is a highly repetitive genomic region. Does minimap2 has some defaults parameters that could affect this straight-forward alignment

Then this is not a straightforward alignment. What is the actual size of the reference here? You could try to use a program like lastz that is meant to do alignments of chromosomes or very large fragments. You can also try changing minimap2 defaults.

ADD REPLY
0
Entering edit mode

My reference subsequence is 168,880n length. Five contigs are relatively small (~ 20-30.000n) while the sixth one is bigger (151,343n). Thanks for your advices !

ADD REPLY
1
Entering edit mode

indeed, I am not an expert in tuning alignments. I think lastz can output PAF though, so it could be used in place of minimap2 if the results are better

In JBrowse, I used dotplots and the synteny style, but all I can view is one contig at a time vs Reference

in the latest version, both dotplots and synteny style views have a "Show all regions" option in the vertical "..." menu, which should show multiple chromosomes at a time (picture attached)

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 2863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6