Question: de novo assembly of circular plasmid
1
gravatar for kspata
6 months ago by
kspata40
Chicago
kspata40 wrote:

Hi All,

I have a circular plasmid sample which was sequenced on MiSeq PE 300. I performed a de novo aseembly using SPAdes. The contig which I obtained is 11526 bp long. The reference sequence for this plasmid is 12089bp long. When I BLAST the reference sequence against the de novo assembled contig I get the BLAST with Plus/Minus stand orientation as follows:

enter image description here

I did reverse complement of the reference and performed alignment again but it is aligning in the same orientation. I loaded the assembly in sequencher. I had to split the assembly into two fragments this introduced a long gap in the alignment as below from 10480 bp to 10999 bp. Sequencher output: enter image description here

  1. How can I get a complete assembly of the genome from the PE 300 reads of a circular plasmid?
  2. Is there a way to renumber the bases in the de novo assembled sequence so that it aligns correctly without introducing a gap.
ADD COMMENTlink modified 6 months ago by piet1.6k • written 6 months ago by kspata40
0
gravatar for piet
6 months ago by
piet1.6k
planet earth
piet1.6k wrote:

The gap indicates that your putative plasmidic contig differs from pTPK_AAV2 by a deletion of about 500 nt. Such deletions or other recombinations are commen in plasmids. You should annotate pTPK_AAV2 and check if lost of the gap region is plausible. To verify the gap, you may map your reads to pTKP_AAV2. You should see a sharp drop in coverage between positions 10450 and 11000.

ADD COMMENTlink written 6 months ago by piet1.6k

Hi Piet, Thank you for responding. I performed a resequencing analysis, in which I mapped the reads to the reference using BWA and performed variant calling apart from 1 insertion of CG at position 6354 and 2 insertions at positions 10384 of 22 bp long and 10349 of 4 bp long, there are no more variants in the consensus sequence. I also checked the depth at these positions and the average per base coverage is 8212X which I guess is high.

So does that mean that the assembler did not generate a complete assembly? If that is the case which assembler can I use? will generating assembly from merged reads work? Also, is it possible that the reason this region is not assembled is because it contains repeats? How can I check for repeats (Which tool or strategy to use to check repeats for this region?).

Additionally, you mentioned annotate the plasmid, how can I do that?

Please excuse me for long series of questions, but this is a challenging problem which I have not faced before and trying to troubleshoot.

Thanks!!!

ADD REPLYlink written 6 months ago by kspata40

the average per base coverage

I do not mean the average coverage. Please evaluate if there is any sharp drop in coverage along the sequence except for the ends. Map the reads to pTPK_AAV2 and then visually inspect the BAM file with Tablet and also run bedtools genomecov on the BAM file.

bedtools genomecov -bga -ibam myreads_on_pTPK_AAV2.bam

Is there any sharp drop in coverage, especially between positions 10400 and 11000? Is there any region with excessive coverage?

ADD REPLYlink modified 6 months ago • written 6 months ago by piet1.6k

annotate the plasmid, how can I do that?

If you do not have experience with software for annotation yet, then I would recommend that you do it manually as an exercise. This short plasmid presumably has 10 to 15 genes, thus manual annotation is feasible.

First determine the open reading frames. Then cut out the sequence of each open reading frame and blast it against the NCBI 'nr' nucleotide database. Finally write the results into a GFF3 file. GFF3 is a simple text format, with one line per gene.

ADD REPLYlink written 6 months ago by piet1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1421 users visited in the last hour