I am using samtools/bcftools to generate a 'de novo' consensus assembly for use with rc454 to clean my sequencing data. The outputted consensus sequence is identical to the reference (see the BLAST alignment graphic at the bottom of the post) till it is truncated at ~5000 of ~7000 bp.
I don't understand why it's happening or what to do about it.
samtools mpileupto generate .vcf
bcftools consensusto (try to) generate a consensus assembly.
According to this guy, the consensus function is buggy. Does anyone else have similar issues? Any ideas of what could be going wrong or a viable workaround? I have to process ~150 files w/ 10,000-60,000 reads of lengths 450-720 bp.
Details of commands, options, and outputs
### command to generate Variant calls samtools mpileup -vf mnv1.fa -r mnv1:5473-5922 APL000000034.34.sorted.bam > APL000000034.34.amp2.vcf [fai_load] build FASTA index. [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000
### consensus command and error message bcftools consensus -f mnv1.fa APL000000034.34.amp2.vcf > scratch.fa Symbolic alleles other than <DEL> are currently not supported: <X> at mnv1:5474
### first three lines of purported "consensus" sequence head -3 scratch.fa >mnv1 gi|113478394|ref|NC_008311.1| Murine norovirus 1, complete genome GTGAAATGAGGATGGCAACGCCATCTTCTGCGCCCTCTGTGCGCAACACAGAGAAACGCA AAAACAAGAAGGCTTCGTCTAAAGCTAGTGTCTCCTTTGGAGCACCTAGCCCCCTCTCTT
### first three lines of reference sequence head -3 mnv1.fa >mnv1 gi|113478394|ref|NC_008311.1| Murine norovirus 1, complete genome GTGAAATGAGGATGGCAACGCCATCTTCTGCGCCCTCTGTGCGCAACACAGAGAAACGCAAAAACAAGAA GGCTTCGTCTAAAGCTAGTGTCTCCTTTGGAGCACCTAGCCCCCTCTCTTCGGAGAGCGAAGACGAAATT