Assembly plasmid with tandem repeats of the same insert
1
0
Entering edit mode
6 months ago
David ▴ 200

Hello, I got data from a lab , that is a plasmid that has been recently sequenced with nanopore. The plasmid is roughly 8kb and contains 40 copies of the same insert. I have the sequences of the plasmid and the insert.

I would like to check that the plasmid does really contain the number of inserts it is supposed to (roughly 40 copies). What would be the best approach you would recommend to assembly the NGS data of this plasmid and count the repeats ?

Thanks

assembly sequencing • 237 views
1
Entering edit mode
6 months ago
hugo.avila ▴ 180

For the assemblie use, Flye it seems like is the best tool for plasmids:

flye --nano-raw your_long_reads.fastq --out-dir out_nano.fasta --threads 4


Your can corrected the --threads parameter to match your computer resources.

As for the counting of the tandem repeats because you have both references

I have the sequences of the plasmid and the insert.

I think you can merge they in a single reference fasta file (plas+repeats+mids) with a text editor tool or a script (if you need help with this please provide a sample dataset and a desired result). When you have a reference you can map your assemblie to it with an aligner of your choice. For example with MUMMER (dnadiff):

dnadiff -p my_output your_builted_reference.fasta out_nano.fasta


Check the output.report to see the statistcs of the alignment. Ideally you need to see an "AvgIdentity" of 100.00 but because of natural mutations and some errors noises in the nanopore reads it is probable that you assemble has some SNPs or other minor alterations so check the overall structure. For some visualization try this site to see your output.delta file click in "iteractive dot plot" you need to see a straight diagonal line in the plot (like this). I think this will do.

1
Entering edit mode

Great Hugo !!! That´s exactly what i was following. Assembly reads with aligner (wtdbg2) but i will give flye a try. Then aligning reads with mummer to the reference.

0
Entering edit mode

I do not recommend that you map reads with mummer it is better suited for large assembled sequences like contigs and chromosomes, i never tried to use it for long reads. If you want to map reads, you don't need to do assembly just map the reads with minimap2 on the the builted reference. But i think assembling first to work with a continuous consensus sequence will correct some noise of the nanopore reads and give a better result.

0
Entering edit mode

Sorry i meant aligning the repeat (25nt) to the assembled genome and visualize the plot , or extract coord with mummer to get the number of mapped repeats.

Agree for aligning reads i would use minimap2.

By the way the first run of flye did not work.” 0 disjointings assembled” although coverage, number of reads and N50-90 are good.

0
Entering edit mode

hummm it seems like although flye is better to find plasmids in a full genome assemblie it did not do well when the datasets is made only of plasmid reads. Let's try to do some variant call, i think this will do. Did you make a reference like i said before (plas+repeats+mids) ? If you did:

minimap2 --MD -a reference.fasta your_reads.fastq > map_file.sam


some file conversion:

samtools view –bS map_file.sam > map_file.bam;
samtools sort –o map_file.sorted.bam map_file.bam;
samtools index map_file.sorted.bam;


from here you will need sniffles:

sniffles -m map_file.sorted.bam -v  variants.vcf


Now you can visualize the .vcf with Artemis, IGV, or parse it with biopython to do a more efficient job.