My project is assembly NGS reads to some long contings for downstream analysis.
I use samtools and bcftools to generate consensus sequence with reads bam file, reference genome and VCF file:
samtools mpileup -vf reference.fasta reads.bam | bcftools call -m -O z - > reads.vcf.gz bcftools index reads.vcf.gz bcftools consensus -f reference.fasta reads.vcf.gz -o consensus.fast
However, I found out the gap regions were filled with reference sequence and I want to replace these gap sequences with "N",
Reference: AGTCTGG TTAA GGCTGC
Reads bam: . . T . . . A . . . A . .
Consensus: AGTTTGA TTAA GGCAGC
I want to replace "TTAA" as "NNNN" and let consensus sequence output as "AGTTTGA NNNN GGCAGC".
How can I do this?
David Wang, Nation Taiwan Ocean University