Question

Prokka and plasmidspades unexpectedly reveal no plasmid

0

Entering edit mode

4.9 years ago

claping ▴ 20

I have two similar genomes (one "known", one "unknown"). The "known" genome has already been determined to contain a plasmid. Both Prokka and plasmidspades have unexpectedly shown no results for plasmids in either genomes. In the case of plasmidspades, I ran the following commands:

plasmidspades.py - 9,21,33,55,77,127 --careful -1 read1.fastq -2 read2.fastq
plasmidspades.py - 9,21,33,55,77,127 -1 read1.fastq -2 read2.fastq

In both cases, there was no contigs.fasta in the main output directory, the assembly_graph.fastg in the main output directory was empty, the warnings.log in the main output directory said "No putative plasmid contigs found!", and inside the directories (such as K127), the final_contigs.fasta was empty. Unless I should check elsewhere in the Prokka outputs, I interpret this to mean no plasmids were found.

So, I then downloaded the plasmid sequence fasta file on GenBank (that had already been discovered for the "known" genome). Using BWA, I aligned the raw reads from the genomes to this plasmid reference fasta file. In the "unknown" genome, I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).

*My question is - *

What approach should I take to solidify the evidence for the presence/absence of plasmid(s) in this "unknown" genome? If there is a plasmid present, how can I determine its sequence with confidence? (Side note: For the "unknown" genome, I have the raw reads, the assembled contigs from running SPAdes, and the reordered contigs from running MAUVE against the "known" genome as a reference).

prokka plasmidspades spades plasmid • 2.2k views

ADD COMMENT • link updated 4.0 years ago by appiahv ▴ 30 • written 4.9 years ago by claping ▴ 20

1

Entering edit mode

I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).

I would not consider 180 reads solid evidence for presence of plasmid (what kind of data do you have BTW, short or long reads?). How many total reads are in this dataset? Have you checked the alignments to see if they look solid (not many mismatches) and which region of the known plasmid the reads are aligning to?

ADD REPLY • link 4.9 years ago by GenoMax 146k

0

Entering edit mode

Total reads are about 480,610. Reads are about 150 base pairs. I do not have much experience at all with this subject and am unsure how to even examine which region of the known plasmid the reads are aligning to. I simply ran code I found online as follows: bwa index plasmid.fasta; bwa mem plasmid.fasta mySample_r1.fastq mySample_r2.fastq > mySample.sam; samtools view -c mySample.sam (total number of reads); samtools view -cF4 mySample.sam (total number of reads in my sample that mapped to plasmid). This is where I am stuck, trying to visualize how these 180 reads align to the plasmid.

ADD REPLY • link 4.9 years ago by claping ▴ 20

1

Entering edit mode

You can use IGV to visualize your aligned data. You will need to create a custom genome with your plasmid reference. Follow the directions for custom genomes here. You will need to sort and index your alignment using samtools.

ADD REPLY • link 4.9 years ago by GenoMax 146k

0

Entering edit mode

180 map to the plasmid? That's barely nothing (I assume Illumina reads since you used spades), you won't be able to assemble the plasmid with it. How is the genome assembly? Can you compare it to the known genome? If it contains the same plasmid you will be able to find it in the assembly regardless of the assembly method.

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

I compared the genome assembly to the "known" genome using MAUVE. They were quite similar (only about 20 LCBs at the default settings). In terms of your last sentence, how can I find the plasmid in the assembly? What tools should I use? This may be a simple question but I am new to this type of analysis.

ADD REPLY • link 4.9 years ago by claping ▴ 20

0

Entering edit mode

I meant by comparing the assembly with the reference genome, do you see a match between the assembly and the plasmid?

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

Align with reference genome
Store the unaligned/unmapped reads in a separate file
Look at the stats and see if good chunk of reads remain unaligned
Map the unaligned reads to plasmid sequence

or

Index the plasmid sequence with bowtie2 or bwa
Index the reference genome with bowtie2 or bwa
Use fastqscreen to check for aligned reads with reference and plasmid

Fastqscreen allows screening against 100000 reads as default from input read files. @ clapin

ADD REPLY • link 4.9 years ago by cpad0112 21k

0

Entering edit mode

How was the DNA for the sequencing run prepped?

I agree with the others, 180 reads is not that much (since no doubt at least a few of those reads will be junk) and certainly not enough to get an assembly.

ADD REPLY • link 4.0 years ago by Joe 21k

score 0 · Answer 1 · 2020-10-03

0

Entering edit mode

4.0 years ago

appiahv ▴ 30

I hope I am not late. I use ragtag (https://github.com/malonge/RagTag) to extract the plasmid sequence

ADD COMMENT • link 4.0 years ago by appiahv ▴ 30