Question: Pilon polish draft assembly, but it didn't work
0
gravatar for chenyufanxin
25 days ago by
chenyufanxin0 wrote:

Hi, I have just assemble a human genome using canu. And I mapped the Illumina data to the draft assembly using bwa mem and I got a bam file called bwa_sort.bam. Then I used pilon to polish the draft assembly , but I found the final result didn't make a change in the original sequence at all. Here is the command I'm running:

  java -Xmx60G -jar pilon-1.22.jar \
 --genome test.fasta \
 --bam $in_dir/bwa_sort.bam \
 --fix "bases" \
 --threads 15 \
 --output test \
 --outdir $out_dir

I even tried to polish single contig, but I got the same result. Here is the log of pilon:

Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400
Genome: test.fasta
Fixing snps, indels
Input genome size: 1022840
Processing tig00000488|arrow:1-779472
Processing tig00000361|arrow:1-243368
tig00000361|arrow:1-243368 log:
unpaired /home/xiatian/polish/Illumina_correct_pacbio/2_samtools_output/bwa_sort.bam: coverage 0
Total Reads: 199348, Coverage: 0, minDepth: 5
Confirmed 0 of 243368 bases (0.00%)
Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
Finished processing tig00000361|arrow:1-243368
tig00000488|arrow:1-779472 log:
unpaired /home/xiatian/polish/Illumina_correct_pacbio/2_samtools_output/bwa_sort.bam: coverage 0
Total Reads: 765929, Coverage: 0, minDepth: 5
Confirmed 0 of 779472 bases (0.00%)
Corrected 0 snps; 0 ambiguous bases; corrected 0 small insertions totaling 0 bases, 0 small deletions totaling 0 bases
Finished processing tig00000488|arrow:1-779472
Writing updated tig00000361|arrow|pilon to /home/xiatian/polish/Illumina_correct_pacbio/4_pilon_output/test.fasta
Writing updated tig00000488|arrow|pilon to /home/xiatian/polish/Illumina_correct_pacbio/4_pilon_output/test.fasta
Mean unpaired coverage: 0
Mean total coverage: 0

It shows thar coverage is always 0, but I can't figure out why this happened. Can you help me ?

sequencing assembly genome • 205 views
ADD COMMENTlink modified 20 days ago • written 25 days ago by chenyufanxin0

Hi, did you checked your bam file ? You can run "samtools flagstat" on it to check the number of reads mapped to the reference.

ADD REPLYlink written 25 days ago by Corentin100

I have just used haplotypecaller to call variants in the contig tig00000488|arrow , and the result contains many variants. It means that there are a lot of reads mapped to the contig indeed, opposite to the coverage 0 showed in pilon log. So , I think my bam file is ok. And I will also run "samtools flagstat" on it to check the number of reads mapped to the reference then.

ADD REPLYlink written 24 days ago by chenyufanxin0

I run "samtools flagstat" on my bam file just now. The output is as below:

2542264173 + 0 in total (QC-passed reads + QC-failed reads)
9669786 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
2535108804 + 0 mapped (99.72% : N/A)
2532594387 + 0 paired in sequencing
1264739823 + 0 read1
1267854564 + 0 read2
0 + 0 properly paired (0.00% : N/A)
2522453600 + 0 with itself and mate mapped
2985418 + 0 singletons (0.12% : N/A)
162547448 + 0 with mate mapped to a different chr
28767110 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLYlink written 22 days ago by chenyufanxin0
0
gravatar for chenyufanxin
20 days ago by
chenyufanxin0 wrote:

I think I have found the reason. I changed the parameter of bwa when mapping Illumina data to the draft assembly and this time pilon worked.

The original command I used to map:

bwa mem -t 15  -M -P -R '@RG\tID:normal\tSM:normal\tLB:normalLib\tPU:runname\tCN:GenePlus\tPL:illumina' ren.arrow.fasta ${in_dir}/HAP1_264_DHG15556-S_1.fq.gz ${in_dir}/HAP1_264_DHG15556-S_2.fq.gz > ${out_dir}/bwa.sam

The new command I used to map:

bwa mem -t 15 -R '@RG\tID:normal\tSM:normal\tLB:normalLib\tPU:runname\tCN:GenePlus\tPL:illumina' ren.arrow.fasta ${in_dir}/HAP1_264_DHG15556-S_1.fq.gz ${in_dir}/HAP1_264_DHG15556-S_2.fq.gz > ${out_dir}/bwa.sam

You can see that I removed the parameter -M and -P from the command. M means marking shorter split hits as secondary (for Picard compatibility), and P means in the paired-end mode, performing SW to rescue missing hits only but do not try to find hits that fit a proper pair.

Though I have solved the problem, but I can't figure out why this happened. Any one can help me ?

ADD COMMENTlink modified 20 days ago • written 20 days ago by chenyufanxin0

If you have time you can try to map with the -M option only, and then with the -P option only, to see which one of the two (or the two together) is causing the issue.

I remember that for pilon the bams need to be "sorted in coordinate order and indexed" (https://github.com/broadinstitute/pilon/wiki/Requirements-&-Usage), but I guess pilon would just throw an error instead of giving 0 coverage for unsorted bams.

ADD REPLYlink written 19 days ago by Corentin100

Thank you for your reply. Following your advice , I map with the -M option only, and then with the -P option only, and I found that -P is causing the issue.

ADD REPLYlink written 15 days ago by chenyufanxin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1431 users visited in the last hour