Question: Assembly plasmid with tandem repeats of the same insert
gravatar for David
8 weeks ago by
David180 wrote:

Hello, I got data from a lab , that is a plasmid that has been recently sequenced with nanopore. The plasmid is roughly 8kb and contains 40 copies of the same insert. I have the sequences of the plasmid and the insert.

I would like to check that the plasmid does really contain the number of inserts it is supposed to (roughly 40 copies). What would be the best approach you would recommend to assembly the NGS data of this plasmid and count the repeats ?


sequencing assembly • 136 views
ADD COMMENTlink modified 8 weeks ago by hugo.avila160 • written 8 weeks ago by David180
gravatar for hugo.avila
8 weeks ago by
hugo.avila160 wrote:

For the assemblie use, Flye it seems like is the best tool for plasmids:

flye --nano-raw your_long_reads.fastq --out-dir out_nano.fasta --threads 4

Your can corrected the --threads parameter to match your computer resources.

As for the counting of the tandem repeats because you have both references

I have the sequences of the plasmid and the insert.

I think you can merge they in a single reference fasta file (plas+repeats+mids) with a text editor tool or a script (if you need help with this please provide a sample dataset and a desired result). When you have a reference you can map your assemblie to it with an aligner of your choice. For example with MUMMER (dnadiff):

dnadiff -p my_output your_builted_reference.fasta out_nano.fasta

Check the to see the statistcs of the alignment. Ideally you need to see an "AvgIdentity" of 100.00 but because of natural mutations and some errors noises in the nanopore reads it is probable that you assemble has some SNPs or other minor alterations so check the overall structure. For some visualization try this site to see your file click in "iteractive dot plot" you need to see a straight diagonal line in the plot (like this). I think this will do.

ADD COMMENTlink written 8 weeks ago by hugo.avila160

Great Hugo !!! That´s exactly what i was following. Assembly reads with aligner (wtdbg2) but i will give flye a try. Then aligning reads with mummer to the reference.

Really helpfull!!!! thanks

ADD REPLYlink written 8 weeks ago by David180

Your welcome ! :)

Then aligning reads with mummer

I do not recommend that you map reads with mummer it is better suited for large assembled sequences like contigs and chromosomes, i never tried to use it for long reads. If you want to map reads, you don't need to do assembly just map the reads with minimap2 on the the builted reference. But i think assembling first to work with a continuous consensus sequence will correct some noise of the nanopore reads and give a better result.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by hugo.avila160

Sorry i meant aligning the repeat (25nt) to the assembled genome and visualize the plot , or extract coord with mummer to get the number of mapped repeats.

Agree for aligning reads i would use minimap2.

By the way the first run of flye did not work.” 0 disjointings assembled” although coverage, number of reads and N50-90 are good.

ADD REPLYlink written 8 weeks ago by David180

hummm it seems like although flye is better to find plasmids in a full genome assemblie it did not do well when the datasets is made only of plasmid reads. Let's try to do some variant call, i think this will do. Did you make a reference like i said before (plas+repeats+mids) ? If you did:

Map the reads:

minimap2 --MD -a reference.fasta your_reads.fastq > map_file.sam

some file conversion:

samtools view –bS map_file.sam > map_file.bam;
samtools sort –o map_file.sorted.bam map_file.bam;
samtools index map_file.sorted.bam;

from here you will need sniffles:

sniffles -m map_file.sorted.bam -v  variants.vcf

Now you can visualize the .vcf with Artemis, IGV, or parse it with biopython to do a more efficient job.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by hugo.avila160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1317 users visited in the last hour