Question

How to contiguate miseq assembly for Burkholderia

0

Entering edit mode

4.9 years ago

Morello Salesman • 0

I have a Burkholderia pseudomallei sequence (Miseq paired-end) and I want to perform comparative genomics against reference genomes (BPk96243, MSHR1435 and others). I already did the spades assembly but it had many nodes (700+). I can use abacas to order/align it against a reference genome but Burkholderia has 2 chromosomes which makes me confused. What should I do so that I can have Fasta files for two chromosomes (Similar to the uploaded ones)? Your help is highly appreciated. Thank you

Assembly alignment • 1.4k views

ADD COMMENT • link updated 4.9 years ago by andres.firrincieli 3.9k • written 4.9 years ago by Morello Salesman • 0

1

Entering edit mode

Since you know it has 2 chromosomes, you could map the reads against each reference chromosome separately and then assemble them individually. Depending on how close your reference genomes are, you may want to do so with quite relaxed alignment parameters.

ADD REPLY • link 4.9 years ago by Joe 22k

0

Entering edit mode

Any recommendations for that? Also for further analysis, should I use two files (like chr1, chr2)?

ADD REPLY • link 4.9 years ago by Morello Salesman • 0

0

Entering edit mode

Depends on the objective/task. You could have a chromosome in each file, you could have them both in the same file, or you could artificially concatenate them with some NNNs. Depends what you need to do.

For the mapping, you'll just need your favourite aligner (BWA/bowtie2 etc) and samtools.

ADD REPLY • link 4.9 years ago by Joe 22k

0

Entering edit mode

I'm interested in comparative genomics (Against clinical reference genomes for essential and virulent genes). The BPk96243 genome that I'm interested in has both of its chromosomes (BX571965, BX571966) uploaded as separate entities. So I want to make something like that (I know this won't be complete genome but still).

ADD REPLY • link 4.9 years ago by Morello Salesman • 0

0

Entering edit mode

"Comparative genomics" is not a task in this context, I'm talking about specifics.

If you only have short-read data, it's highly unlikely that you will get a complete, contiguous, closed assembly no matter what you do.

ADD REPLY • link 4.9 years ago by Joe 22k

0

Entering edit mode

I want to have chromosomes in each file (I do not want to join the chromosomes together).

ADD REPLY • link 4.9 years ago by Morello Salesman • 0

score 0 · Answer 1 · 2020-08-11

0

Entering edit mode

4.9 years ago

andres.firrincieli 3.9k

Hi,

Before using contiguate I would do a quality check of your data. For a bacteria with an expected genome size of approximately 7.0 Mbp, 700+ nodes is a lot. Did you filter out short contigs, e.g < 500 bp. If so, then you might have a contamination or your libraries have a poor quality.

To check for the presence of potential contaminats use CheckM, or look at the GC profile of your reads with FastQC

ADD COMMENT • link 4.9 years ago by andres.firrincieli 3.9k

0

Entering edit mode

I got 500 nodes after removing short contigs <500 bp.

ADD REPLY • link 4.9 years ago by Morello Salesman • 0

0

Entering edit mode

Did you QC your reads before assembly too?

ADD REPLY • link 4.9 years ago by Joe 22k

0

Entering edit mode

I QCd my reads then I used Trim galore for quality and adapter trimming.

ADD REPLY • link 4.9 years ago by Morello Salesman • 0

0

Entering edit mode

Could you check the quality of the assembly with CheckM? Just to be sure that you do not have any contamination

ADD REPLY • link 4.9 years ago by andres.firrincieli 3.9k