Question: Cannot align reads to plasmid
0
gravatar for David
14 months ago by
David150
David150 wrote:

Hi, I have sequenced a bacterial genome for which i have a reference genome (98% similarity).

I have used bwa to map reads to the reference genome: bwa mem reference.fa reads.R1.fq.gz reads.R2.fq.gz

I´m failing to recover the plasmid although i know it´s there. I have run the assembly using megahit and align the contigs to the plasmid and i recover 88% of the plasmid.

What i don´t understand is why the reads do not map to the plasmid ???? - samtools flagstat PLASMID.sorted.bam -

1435694 + 0 in total (QC-passed reads + QC-failed reads)   0 + 0 secondary 
0 + 0 supplementary 
0 + 0 duplicates 
0 + 0 mapped (0.00% :N/A) 
1435694 + 0 paired in sequencing 
717847 +  0 read1
717847 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A) 
0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

If i check the reads after the genome assembly i get pretty good mapping

1122036 + 0 in total (QC-passed reads + QC-failed reads) 
0 + 0 secondary
574 + 0 supplementary
0 + 0 duplicates 
1116358 + 0 mapped (99.49% : N/A)
1121462 + 0 paired in sequencing 
560767 + 0 read1 
560695 + 0 read2 
1108556 + 0 properly paired (98.85% : N/A)
1110902 + 0 with itself and mate mapped
4882 + 0 singletons (0.44% : N/A)
1598 + 0 with mate mapped to a different chr 1210 + 0 with mate mapped to a different chr (mapQ>=5)

Any idea why i´m missing the plasmid when aligning clean reads directly to the plasmid ???

sequencing bwa • 493 views
ADD COMMENTlink modified 14 months ago by h.mon24k • written 14 months ago by David150

Can you try using bbsplit.sh from BBMap suite using plasmid and genome sequence at the same time to bin the reads? You have not said what length your reads are (are they trimmed/cleaned of adapters). Pay attention to the settings about the reads that multi-map (across and within the genomes provided)

ADD REPLYlink written 14 months ago by genomax65k

Thanks for your response genomax. It´s an illumina 2*250bp on a single bacterial genome. It turns out that insert size average is 300, not that good, but i have quality trimmed all sequences and remove adapters and phiX genome.

Here is the output from bbspplit

#name   %unambiguousReads       unambiguousMB   %ambiguousReads ambiguousMB     unambiguousReads        ambiguousReads
Reference_genome_without_plasmid    99.36129        226.48699       0.00018 0.00026 1121942 2
plasmid  0.00000 0.00000 0.00018 0.00026 0       2

The idea behind the sequencing of that specific strain is that is´s phenotypically different from the reference, so the idea is to look at the genome and find if there is genomic event that might explain this phenotipically difference.

ADD REPLYlink written 14 months ago by David150
0
gravatar for h.mon
14 months ago by
h.mon24k
Brazil
h.mon24k wrote:

You are failing to report some fundamental information: what is the similarity between the plasmid and the assembled contigs? How are you aligning the contigs to the plasmid? What is the coverage of the contigs mapping to the plasmid? Is it different from the contigs mapping to the bacterial genome?

Maybe the similarity is too low to map short reads to the plasmid with bwa, but high enough to align the contigs to the plasmid with whatever software you used (blast?).

ADD COMMENTlink written 14 months ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour