How to contiguate miseq assembly for Burkholderia
1
0
Entering edit mode
3.7 years ago

I have a Burkholderia pseudomallei sequence (Miseq paired-end) and I want to perform comparative genomics against reference genomes (BPk96243, MSHR1435 and others). I already did the spades assembly but it had many nodes (700+). I can use abacas to order/align it against a reference genome but Burkholderia has 2 chromosomes which makes me confused. What should I do so that I can have Fasta files for two chromosomes (Similar to the uploaded ones)? Your help is highly appreciated. Thank you

Assembly alignment • 1.0k views
ADD COMMENT
1
Entering edit mode

Since you know it has 2 chromosomes, you could map the reads against each reference chromosome separately and then assemble them individually. Depending on how close your reference genomes are, you may want to do so with quite relaxed alignment parameters.

ADD REPLY
0
Entering edit mode

Any recommendations for that? Also for further analysis, should I use two files (like chr1, chr2)?

ADD REPLY
0
Entering edit mode

Depends on the objective/task. You could have a chromosome in each file, you could have them both in the same file, or you could artificially concatenate them with some NNNs. Depends what you need to do.

For the mapping, you'll just need your favourite aligner (BWA/bowtie2 etc) and samtools.

ADD REPLY
0
Entering edit mode

I'm interested in comparative genomics (Against clinical reference genomes for essential and virulent genes). The BPk96243 genome that I'm interested in has both of its chromosomes (BX571965, BX571966) uploaded as separate entities. So I want to make something like that (I know this won't be complete genome but still).

ADD REPLY
0
Entering edit mode

"Comparative genomics" is not a task in this context, I'm talking about specifics.

If you only have short-read data, it's highly unlikely that you will get a complete, contiguous, closed assembly no matter what you do.

ADD REPLY
0
Entering edit mode

I want to have chromosomes in each file (I do not want to join the chromosomes together).

ADD REPLY
0
Entering edit mode
3.7 years ago

Hi,

Before using contiguate I would do a quality check of your data. For a bacteria with an expected genome size of approximately 7.0 Mbp, 700+ nodes is a lot. Did you filter out short contigs, e.g < 500 bp. If so, then you might have a contamination or your libraries have a poor quality.

To check for the presence of potential contaminats use CheckM, or look at the GC profile of your reads with FastQC

ADD COMMENT
0
Entering edit mode

I got 500 nodes after removing short contigs <500 bp.

ADD REPLY
0
Entering edit mode

Did you QC your reads before assembly too?

ADD REPLY
0
Entering edit mode

I QCd my reads then I used Trim galore for quality and adapter trimming.

ADD REPLY
0
Entering edit mode

Could you check the quality of the assembly with CheckM? Just to be sure that you do not have any contamination

ADD REPLY

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6