How to get contigs.fa from BAM or VCF files
1
0
Entering edit mode
5.7 years ago
pgcudahy • 0

An outside lab took some M. tuberculosis clinical isolates and sequenced them on as Illumina miseq. They then ran it through their pipeline to map it to H37rv. They've shared with me their BAM and VCF files and I'd like to annotate the genomes with prokka. Prokka expects a contigs.fa input. Is there a way to get from either a bam or vcf to contigs in a fasta format?

alignment • 1.3k views
ADD COMMENT
0
Entering edit mode

@jrj.healey describes one way of doing the analysis below.

If you want to start with VCF calls you have in hand then you could generate a new consensus for each strain using the procedure described on this page. Then go to next step.

ADD REPLY
0
Entering edit mode

if you have reference file and vcf, you can use bcftools consensus function to create fasta.

ADD REPLY
1
Entering edit mode
5.7 years ago
Joe 21k

Replacing this answer after a little discussion on the actual question from OP

To generate a contigs.fasta one would typically assemble the reads (this is what prokka is assuming you've done), taking in the contigs.fa file that is generated from programs like SPAdes and other assemblers.

To go from the data on hand back to assemble-able reads, one can use samtools fastq to re-output all the FASTQs that were used in its creation:

samtools fastq -1 R1.fastq -2 R2.fastq bamfile.bam

For incorporating the variant information (which it wasn't clear to me in the first instance that you wanted to use), you can follow @genomax's suggestion of using bcftools consensus.

ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6