Number of contigs is too big
0
0
Entering edit mode
4.1 years ago
Bioinfo ▴ 20

Hello . i hope you're doing fine i have question please , i have Illumina Miseq data for bacteria set , when i performed assembly i got too many conting in results ( 7165 ) what may be the causes of this results and how ca i improve the results of the assembly ?

Thank you

assembly alignment sequencing sequence • 1.2k views
ADD COMMENT
0
Entering edit mode

Please provide additional information about the expected size of the genome. Size/type of the dataset and/or methods used for the assembly. This is important so you would not get suggestions of things to try that you may have already done.

ADD REPLY
0
Entering edit mode

the expected genome size is 19146100 and the total length that I found is 6534152((-65.87%)) , and here's the command I used

shovill --outdir Assembly_Miseq_Default --R1 R1_001.fastq –R2 _R2_001.fastq
ADD REPLY
0
Entering edit mode

How many reads went into this assembly and what was their length? In other words what was the total size (base pairs) or data you used for this assembly.

ADD REPLY
0
Entering edit mode

2842610 reads in the file 1 and 2842610 in the file 2 and their length is 250

ADD REPLY
0
Entering edit mode

So you seem to have ~75x coverage in terms of bases sequenced (if my calculation is right). In theory this should be enough coverage to get a reasonably good assembly.

There is always the possibility that a) your libraries are not that good (have a lot of technical duplication, over-amplification) b) you are not doing the assemblies right or have used the correct aligner that would work. Since those possibilities are hard to assess to in a forum like this you are going to have to work on those yourself.

If there are related genomes available in GenBank you can try to do a reference assisted genome assembly to see if that helps.

ADD REPLY
0
Entering edit mode

i performed assembly

What software did you use? What were the parameters you passed? Is there a log file? Were there any warnings or errors? How do you know the software was run accurately?

i got too many contigs in results

How do you know they are too many contigs? How many did you expect? How are these contigs different in composition/length from the ones you expected?

Once you look for the answers to the above questions, you'll probably find a good lead to the solution, if not the solution itself.

ADD REPLY
0
Entering edit mode

Hello

I used shovill for the assembly and I performed this command

shovill --outdir wDil_Assembly_Miseq_Default --R1 R1_001.fastq –R2  _R2_001.fastq

and also I tried specific kmer

shovill  --kmer 127--outdir wDil_Assembly_Miseq_Default --R1 R1_001.fastq –R2  _R2_001.fastq

I performed assembly of Hiseq data of the same bacteria and I found better results (1664 contigs ) that what makes me think that the result of Miseq data is abnormal

ADD REPLY

Login before adding your answer.

Traffic: 2888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6