Question

How to get contigs.fa from BAM or VCF files

0

Entering edit mode

5.7 years ago

pgcudahy • 0

An outside lab took some M. tuberculosis clinical isolates and sequenced them on as Illumina miseq. They then ran it through their pipeline to map it to H37rv. They've shared with me their BAM and VCF files and I'd like to annotate the genomes with prokka. Prokka expects a contigs.fa input. Is there a way to get from either a bam or vcf to contigs in a fasta format?

alignment • 1.3k views

ADD COMMENT • link updated 5.7 years ago by Joe 21k • written 5.7 years ago by pgcudahy • 0

0

Entering edit mode

@jrj.healey describes one way of doing the analysis below.

If you want to start with VCF calls you have in hand then you could generate a new consensus for each strain using the procedure described on this page. Then go to next step.

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

if you have reference file and vcf, you can use bcftools consensus function to create fasta.

ADD REPLY • link 5.7 years ago by cpad0112 21k

score 1 · Answer 1 · 2018-07-31

Replacing this answer after a little discussion on the actual question from OP

To generate a contigs.fasta one would typically assemble the reads (this is what prokka is assuming you've done), taking in the contigs.fa file that is generated from programs like SPAdes and other assemblers.

To go from the data on hand back to assemble-able reads, one can use samtools fastq to re-output all the FASTQs that were used in its creation:

samtools fastq -1 R1.fastq -2 R2.fastq bamfile.bam

For incorporating the variant information (which it wasn't clear to me in the first instance that you wanted to use), you can follow @genomax's suggestion of using bcftools consensus.