Question: How to contiguate miseq assembly for Burkholderia
0
gravatar for Morello Salesman
5 weeks ago by
Morello Salesman0 wrote:

I have a Burkholderia pseudomallei sequence (Miseq paired-end) and I want to perform comparative genomics against reference genomes (BPk96243, MSHR1435 and others). I already did the spades assembly but it had many nodes (700+). I can use abacas to order/align it against a reference genome but Burkholderia has 2 chromosomes which makes me confused. What should I do so that I can have Fasta files for two chromosomes (Similar to the uploaded ones)? Your help is highly appreciated. Thank you

alignment assembly • 170 views
ADD COMMENTlink modified 5 weeks ago by andres.firrincieli820 • written 5 weeks ago by Morello Salesman0
1

Since you know it has 2 chromosomes, you could map the reads against each reference chromosome separately and then assemble them individually. Depending on how close your reference genomes are, you may want to do so with quite relaxed alignment parameters.

ADD REPLYlink written 5 weeks ago by Joe17k

Any recommendations for that? Also for further analysis, should I use two files (like chr1, chr2)?

ADD REPLYlink written 5 weeks ago by Morello Salesman0

Depends on the objective/task. You could have a chromosome in each file, you could have them both in the same file, or you could artificially concatenate them with some NNNs. Depends what you need to do.

For the mapping, you'll just need your favourite aligner (BWA/bowtie2 etc) and samtools.

ADD REPLYlink written 5 weeks ago by Joe17k

I'm interested in comparative genomics (Against clinical reference genomes for essential and virulent genes). The BPk96243 genome that I'm interested in has both of its chromosomes (BX571965, BX571966) uploaded as separate entities. So I want to make something like that (I know this won't be complete genome but still).

ADD REPLYlink written 5 weeks ago by Morello Salesman0

"Comparative genomics" is not a task in this context, I'm talking about specifics.

If you only have short-read data, it's highly unlikely that you will get a complete, contiguous, closed assembly no matter what you do.

ADD REPLYlink written 5 weeks ago by Joe17k

I want to have chromosomes in each file (I do not want to join the chromosomes together).

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Morello Salesman0
0
gravatar for andres.firrincieli
5 weeks ago by
andres.firrincieli820 wrote:

Hi,

Before using contiguate I would do a quality check of your data. For a bacteria with an expected genome size of approximately 7.0 Mbp, 700+ nodes is a lot. Did you filter out short contigs, e.g < 500 bp. If so, then you might have a contamination or your libraries have a poor quality.

To check for the presence of potential contaminats use CheckM, or look at the GC profile of your reads with FastQC

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by andres.firrincieli820

I got 500 nodes after removing short contigs <500 bp.

ADD REPLYlink written 5 weeks ago by Morello Salesman0

Did you QC your reads before assembly too?

ADD REPLYlink written 5 weeks ago by Joe17k

I QCd my reads then I used Trim galore for quality and adapter trimming.

ADD REPLYlink written 5 weeks ago by Morello Salesman0

Could you check the quality of the assembly with CheckM? Just to be sure that you do not have any contamination

ADD REPLYlink written 5 weeks ago by andres.firrincieli820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1115 users visited in the last hour