Parasite genome assembly
1
0
Entering edit mode
12 weeks ago

Hi All,

(1) I have been working with a parasite genome assembly using the BWA tool. l used the following command to execute assembly (paired-end Illumina short reads).

module load bwa/0.7.15

bwa mem -t 1 -M -R "@RG\tID:reads\tSM: AA_genome" reference_genome.fasta  AA_genome1.fasta.gz  AA_genome2.fasta.gz > AA_genome_aln-pe.sam

(2) l got AA_genome_aln-pe.sam output which is around 50 GB. I also tried to convert this sorted sam file to FASTA format using

samtools bam2fq AA_genome.srt.bam | seqtk seq -A > AA_genome_assembly.fa 

However, the final output that l got is in 20 GB. My expected assembly size was approximately 50 MB. How can l get final the assembly in desired output size? Is there still something l am missing in the analysis?

Thank you

parasite assembly BWA genome • 400 views
ADD COMMENT
4
Entering edit mode
12 weeks ago
GenoMax 111k

Is there still something l am missing in the analysis?

bwa is an NGS data aligner not a genome assembler. If you are looking to assemble the data then you are using the wrong program. You should be using something like SOAPdenovo, SPAdes if you are looking to assemble your genome starting with (do you only have fasta format data or did you convert the fastq files) sequence data.

If you are aligning to a reference genome (which seems to be the case above) then the size of aligned data file has nothing to do with the size of the genome/assembly. That size is simply reflective of alignments found for your reads against the reference.

You can generate a consensus sequence using the bwa aligned data file (generated consensus should be close in size to your reference). This thread will help with that: Generating consensus sequence from bam file

ADD COMMENT
0
Entering edit mode

Thank you so much. I was totally on a different track. Is there any eukaryotic parasite-specific assembler available for Illumina short reads?

ADD REPLY
1
Entering edit mode

Have a look at the assembler Spades to get started. Theres' plenty, however, see eg Wikipedia https://en.wikipedia.org/wiki/De_novo_sequence_assemblers

ADD REPLY
1
Entering edit mode

+1 for SPAdes suggestion. With a 50 Mb genome this would be a good place to start.

ADD REPLY

Login before adding your answer.

Traffic: 2613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6