Question: Create Vcf File From A Multiple Sequence Alignments
gravatar for Whetting
7.0 years ago by
Bethesda, MD
Whetting1.5k wrote:

Dear Biostars,
I have a question concerning the generation of vcf (variant calling format) creation.
Does anyone know of a tool that would allow me to turn a multiple sequence alignment (containing reference and several variants) into a vcf file?

I have a multiple sequence alignment of a several cloned papillomaviruses. We know that the sequence of each individual genome are correct. I.e. all variations between the reference and these additional sequences represent naturally occurring SNPs (and not sequencing errors). I would like to extract the SNPs (and indels) from this alignment and create a vcf file. I hope this clarifies the problem! thanks again

vcf alignment variant snp • 7.4k views
ADD COMMENTlink modified 4.3 years ago by Giovanni M Dall'Olio26k • written 7.0 years ago by Whetting1.5k

duplicate of Getting A Vcf File From A Fasta Alignment

ADD REPLYlink written 4.3 years ago by Pierre Lindenbaum121k

This was asked 2.6 years ago. :p

ADD REPLYlink written 4.3 years ago by geek_y9.7k

search the website for "SNP calling"

ADD REPLYlink written 7.0 years ago by Giovanni M Dall'Olio26k

SNP calling is a little bit different from what I am looking for. Calling implies a certain threshold before something is considered a SNP and returns a level of confidence for each identified SNP. The sequences i am using are confirmed variants, i.e. I know that each variation is real. I would like to "simply" create a vcf file containing all differences between the files

ADD REPLYlink written 7.0 years ago by Whetting1.5k

Did you figure out a tool that does this? Also do you mean that any multiple sequence alignments using assembled sequences (assuming the assembly is correct) do not have to go through a "variant calling" approach? What about alignment errors?

ADD REPLYlink written 4.3 years ago by Felix Francis490

What format is your data in? We need more information to understand what you are trying to do. What and Why = Best answer.

ADD REPLYlink written 7.0 years ago by Zev.Kronenberg11k
gravatar for Zev.Kronenberg
6.9 years ago by
United States
Zev.Kronenberg11k wrote:

There isn't a program I am aware of that does what you want. However, here are the steps I would take:

  1. Inport your MSA (multiple sequence alignment) into a program that can output variant sites only. Paup* can do this.
  2. Output the matrix and map the gene position to the genomic position.
  3. Write a script that will convert these data to VCF format.
ADD COMMENTlink written 6.9 years ago by Zev.Kronenberg11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 972 users visited in the last hour