Question: Prokka and plasmidspades unexpectedly reveal no plasmid
0
gravatar for claping
5 weeks ago by
claping0
claping0 wrote:

I have two similar genomes (one "known", one "unknown"). The "known" genome has already been determined to contain a plasmid. Both Prokka and plasmidspades have unexpectedly shown no results for plasmids in either genomes. In the case of plasmidspades, I ran the following commands:

plasmidspades.py - 9,21,33,55,77,127 --careful -1 read1.fastq -2 read2.fastq
plasmidspades.py - 9,21,33,55,77,127 -1 read1.fastq -2 read2.fastq

In both cases, there was no contigs.fasta in the main output directory, the assembly_graph.fastg in the main output directory was empty, the warnings.log in the main output directory said "No putative plasmid contigs found!", and inside the directories (such as K127), the final_contigs.fasta was empty. Unless I should check elsewhere in the Prokka outputs, I interpret this to mean no plasmids were found.

So, I then downloaded the plasmid sequence fasta file on GenBank (that had already been discovered for the "known" genome). Using BWA, I aligned the raw reads from the genomes to this plasmid reference fasta file. In the "unknown" genome, I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).

*My question is - *

What approach should I take to solidify the evidence for the presence/absence of plasmid(s) in this "unknown" genome? If there is a plasmid present, how can I determine its sequence with confidence? (Side note: For the "unknown" genome, I have the raw reads, the assembled contigs from running SPAdes, and the reordered contigs from running MAUVE against the "known" genome as a reference).

ADD COMMENTlink written 5 weeks ago by claping0
1

I see 180 reads mapping to the plasmid reference fasta file. I believe this is evidence that the "unknown" genome contains a plasmid (even if undetected by Prokka and plasmidspades).

I would not consider 180 reads solid evidence for presence of plasmid (what kind of data do you have BTW, short or long reads?). How many total reads are in this dataset? Have you checked the alignments to see if they look solid (not many mismatches) and which region of the known plasmid the reads are aligning to?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax75k

Total reads are about 480,610. Reads are about 150 base pairs. I do not have much experience at all with this subject and am unsure how to even examine which region of the known plasmid the reads are aligning to. I simply ran code I found online as follows: bwa index plasmid.fasta; bwa mem plasmid.fasta mySample_r1.fastq mySample_r2.fastq > mySample.sam; samtools view -c mySample.sam (total number of reads); samtools view -cF4 mySample.sam (total number of reads in my sample that mapped to plasmid). This is where I am stuck, trying to visualize how these 180 reads align to the plasmid.

ADD REPLYlink written 5 weeks ago by claping0
1

You can use IGV to visualize your aligned data. You will need to create a custom genome with your plasmid reference. Follow the directions for custom genomes here. You will need to sort and index your alignment using samtools.

ADD REPLYlink written 5 weeks ago by genomax75k

180 map to the plasmid? That's barely nothing (I assume Illumina reads since you used spades), you won't be able to assemble the plasmid with it. How is the genome assembly? Can you compare it to the known genome? If it contains the same plasmid you will be able to find it in the assembly regardless of the assembly method.

ADD REPLYlink written 5 weeks ago by Asaf6.5k

I compared the genome assembly to the "known" genome using MAUVE. They were quite similar (only about 20 LCBs at the default settings). In terms of your last sentence, how can I find the plasmid in the assembly? What tools should I use? This may be a simple question but I am new to this type of analysis.

ADD REPLYlink written 5 weeks ago by claping0

I meant by comparing the assembly with the reference genome, do you see a match between the assembly and the plasmid?

ADD REPLYlink written 5 weeks ago by Asaf6.5k
  1. Align with reference genome
  2. Store the unaligned/unmapped reads in a separate file
  3. Look at the stats and see if good chunk of reads remain unaligned
  4. Map the unaligned reads to plasmid sequence

or

  1. Index the plasmid sequence with bowtie2 or bwa
  2. Index the reference genome with bowtie2 or bwa
  3. Use fastqscreen to check for aligned reads with reference and plasmid

    Fastqscreen allows screening against 100000 reads as default from input read files. @ clapin

ADD REPLYlink written 5 weeks ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour