I have been trying to generate a consensus sequence with samtools/bcftools for a couple of days now and I'm stuck.
First my test data:
Marvin$ cat reference.fa
Marvin$ cat reads.fastq
I have copied parts of the reference to generate these reads. They span a contig across region
0-130 in the reference.
Goal: Reconstruct the consensus sequence of that region (length 130).
bwa index reference.fa #index the reference
bwa mem reference.fa reads.fastq > aln.sam #map the reads
samtools view -bh aln.sam > aln.bam #convert sam to bam
samtools sort aln.bam aln_sorted #sort the bam, output = aln_sorted.bam
samtools index aln_sorted.bam #don't remember which downstream program needed this
bedtools merge -i aln_sorted.bam #shows me the contigs in the mapping. only 1 present at myseq:0-130
Then I create the vcf file:
samtools view -h aln_sorted.bam "myseq:0-130" | samtools mpileup -v - > my_vcf_file.bgz
Now I index that file with tabix, because earlier I've had an error message which told me that the index was missing:
tabix -p vcf my_vcf_file.bgz
And now I try to finally get the consensus:
bcftools consensus -i my_vcf_file.bgz #option -i to get iupac codes at ambiguous sites
But it doesn't work, it throws the Usage page at me.
And it says:
"Usage: bcftools consensus [OPTIONS] <file.vcf>"
Acutally all it should need is the input file (which I have provided), because OPTIONS are optional right?
However if I try it with the parameter -f, I get:
Marvin$ bcftools consensus -i -f reference.fa my_vcf_file.bgz
The fasta sequence does not match the REF allele at myseq:1:
.vcf: [N] <- (ALT)
I don't understand this error and I am not sure if I am moving into the right direction, I find no way out of this maze, please help ...