Question: Number of contigs is too big
0
gravatar for Bioinfo
8 weeks ago by
Bioinfo20
Morocco
Bioinfo20 wrote:

Hello . i hope you're doing fine i have question please , i have Illumina Miseq data for bacteria set , when i performed assembly i got too many conting in results ( 7165 ) what may be the causes of this results and how ca i improve the results of the assembly ?

Thank you

ADD COMMENTlink written 8 weeks ago by Bioinfo20

Please provide additional information about the expected size of the genome. Size/type of the dataset and/or methods used for the assembly. This is important so you would not get suggestions of things to try that you may have already done.

ADD REPLYlink written 8 weeks ago by genomax84k

the expected genome size is 19146100 and the total length that I found is 6534152((-65.87%)) , and here's the command I used

shovill --outdir Assembly_Miseq_Default --R1 R1_001.fastq –R2 _R2_001.fastq
ADD REPLYlink modified 8 weeks ago by RamRS27k • written 8 weeks ago by Bioinfo20

How many reads went into this assembly and what was their length? In other words what was the total size (base pairs) or data you used for this assembly.

ADD REPLYlink written 8 weeks ago by genomax84k

2842610 reads in the file 1 and 2842610 in the file 2 and their length is 250

ADD REPLYlink written 8 weeks ago by Bioinfo20

So you seem to have ~75x coverage in terms of bases sequenced (if my calculation is right). In theory this should be enough coverage to get a reasonably good assembly.

There is always the possibility that a) your libraries are not that good (have a lot of technical duplication, over-amplification) b) you are not doing the assemblies right or have used the correct aligner that would work. Since those possibilities are hard to assess to in a forum like this you are going to have to work on those yourself.

If there are related genomes available in GenBank you can try to do a reference assisted genome assembly to see if that helps.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax84k

i performed assembly

What software did you use? What were the parameters you passed? Is there a log file? Were there any warnings or errors? How do you know the software was run accurately?

i got too many contigs in results

How do you know they are too many contigs? How many did you expect? How are these contigs different in composition/length from the ones you expected?

Once you look for the answers to the above questions, you'll probably find a good lead to the solution, if not the solution itself.

ADD REPLYlink written 8 weeks ago by RamRS27k

Hello

I used shovill for the assembly and I performed this command

shovill --outdir wDil_Assembly_Miseq_Default --R1 R1_001.fastq –R2  _R2_001.fastq

and also I tried specific kmer

shovill  --kmer 127--outdir wDil_Assembly_Miseq_Default --R1 R1_001.fastq –R2  _R2_001.fastq

I performed assembly of Hiseq data of the same bacteria and I found better results (1664 contigs ) that what makes me think that the result of Miseq data is abnormal

ADD REPLYlink modified 8 weeks ago by RamRS27k • written 8 weeks ago by Bioinfo20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 711 users visited in the last hour