Question: What should be an ideal coverage of an assembled putative eukaryotic Plasmid?
0
gravatar for jigarnt
3.0 years ago by
jigarnt30
Canada
jigarnt30 wrote:

Hi All,

I have Illumina sequenced a putative plasmid using Hi seq 125 bp pair-end library. Size of the putative plasmid on the gel is around 3kb and I got around 10 contigs of that size when I assembled it in SPADES. Now, if it is a plasmid which I think so it is, I am bound to get very high coverage. In that case, what could I possibly do next to find out if it is a plasmid or not?

plasmid spades sequence assembly • 1.2k views
ADD COMMENTlink modified 3.0 years ago by Chris Fields2.1k • written 3.0 years ago by jigarnt30
1

Have you compared the 10 contigs to each other to see how similar they are and if they could be collapsed into a smaller set? They may be related to each other. Was the data generated from isolated "putative" plasmid DNA or did the sample have other DNA? Do eukaryotic plasmids have an identifiable origin of replication that you could look for (just thinking out aloud)?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax64k

Hi Genomax2,

I had Gel extracted my putative plasmid from the Genomic DNA, so there is no question of contamination in it. The 10 contigs which I got are in the size range from 6.6kb to 2.5kb and coverage ranging from 66 to 2. I did a Nucleotide BLAST and I am getting Hits of E. coli plasmid for most of my contigs. As I dont know what should be the ideal coverage of a plasmid, I am baffled in selecting any one contig. Does a Prokaryotic and Eukaryotic plasmid have a similar origin of replication?

ADD REPLYlink written 3.0 years ago by jigarnt30
1

Have you blasted the contigs against each other? That would be one way to judge their similarity. You could also use Mauve and try to align them to each other.

In any case you probably have coverage that is much deeper than necessary to do this assembly. Try the options suggested by @Chris below.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax64k
1
gravatar for Chris Fields
3.0 years ago by
Chris Fields2.1k
University of Illinois Urbana-Champaign
Chris Fields2.1k wrote:

SPAdes should indicate the coverage of the scaffolds generated in the scaffold name. You can also determine this by mapping the reads back and assessing average coverage for the scaffolds.

If you have very high coverage (if this is the only data in a HiSeq lane and the size is ~3Kb, it's likely extremely high unless you have other low-coverage stuff in there such as genomic seq), it may be worth either simply directly downsampling the data or filtering sequences w/ low abundance kmers followed by normalization (khmer can do this), then retrying the assembly, sometimes it helps. We did this for plant chloroplast genomes w/ low coverage WGS data, worked a charm.

ADD COMMENTlink written 3.0 years ago by Chris Fields2.1k

Hi Chris,

It was the only data in my Hiseq Lane and So, I want to know how much is extremely high coverage. Contigs of my coverage ranges from a highest of 66 to lowest of 2. I had set my K mer value as -k 21,33,55,77, which are default I assume? What values would you recommend?

ADD REPLYlink written 3.0 years ago by jigarnt30

I generally suggest around 100-200x max so 60x is fine, but the coverage you mention doesn't make much sense in the context of how much a typical HiSeq run yields (~400M reads). Is this low-pass WGS sequencing? By my (back of the napkin) calculation you'd have ~15 million-fold coverage with a simple plasmid; ~45-50Gb of data from a typical HiSeq paired-end lane for a 3,000nt genome.

Re: k-mer distribution, I mean using a tool like khmer or Jellyfish to generate a kmer distribution graph. khmer can also filter the data you have based on abundance.

ADD REPLYlink written 3.0 years ago by Chris Fields2.1k

Hi Chris,

I do not know if it is a low pass WGS or not. I fetched 2 files(R1 & R2) of 700 mb each for my Plasmid. So as you said, it could be a low pass NGS as the file size drastically differs from what you said.

ADD REPLYlink written 3.0 years ago by jigarnt30

Your sample may have been multiplexed with others to save you money. If you did FastQC analysis how many reads did it report were there and how many cycles of sequencing did you do?

ADD REPLYlink written 3.0 years ago by genomax64k

Hi Genomax2,

I had outsourced my sequencing so I do not really know whether about sequencing cycles, but I fetched ~4 million reads.

ADD REPLYlink written 3.0 years ago by jigarnt30

Is that 4 mil total (R1+R2) or each? How long (bp) are the individual original reads?

ADD REPLYlink written 3.0 years ago by genomax64k

Individual reads are of 125bp in length. Reads in each file are 2.3 million so R1 & R2 combined would be 4.6 million.

Also, I tried to align contigs with the reference genome and some of them are aligning. What could be the way to find out its identity?

ADD REPLYlink written 3.0 years ago by jigarnt30
1

You have a reference genome for the plasmid? If you do then clearly the ones that are aligning must be the correct contigs.

ADD REPLYlink written 3.0 years ago by genomax64k

Hi ,

That was the reference genome of my organism from which I gel extracted the Plasmid. When I BLASTed my contigs, I was getting match with my organism and also with a E. coli Plasmid. I wonder why it was showing match against an E.coli Plasmid(Prokaryotic Plasmid). Identity of my Putative Plasmid still remains unknown.

ADD REPLYlink written 3.0 years ago by jigarnt30

Hi,

Is there any possibility that a ds RNA resist its degradation through RNAse and get itself sequenced on an illumina platform with the pair end library meant for DNA??

The reason why I am asking this is because of the possibility that there is no Plasmid and instead we ended up sequencing a dsRNA (mycovirus)?

ADD REPLYlink written 3.0 years ago by jigarnt30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour