Question: Create Vcf File From A Multiple Sequence Alignments
4
gravatar for Whetting
6.3 years ago by
Whetting1.5k
Bethesda, MD
Whetting1.5k wrote:

Dear Biostars,
I have a question concerning the generation of vcf (variant calling format) creation.
Does anyone know of a tool that would allow me to turn a multiple sequence alignment (containing reference and several variants) into a vcf file?
thanks!

EDIT:
I have a multiple sequence alignment of a several cloned papillomaviruses. We know that the sequence of each individual genome are correct. I.e. all variations between the reference and these additional sequences represent naturally occurring SNPs (and not sequencing errors). I would like to extract the SNPs (and indels) from this alignment and create a vcf file. I hope this clarifies the problem! thanks again

vcf alignment variant snp • 6.4k views
ADD COMMENTlink modified 3.6 years ago by Giovanni M Dall'Olio26k • written 6.3 years ago by Whetting1.5k
1

duplicate of Getting A Vcf File From A Fasta Alignment

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum112k

This was asked 2.6 years ago. :p

ADD REPLYlink written 3.6 years ago by geek_y8.7k

search the website for "SNP calling"

ADD REPLYlink written 6.3 years ago by Giovanni M Dall'Olio26k

SNP calling is a little bit different from what I am looking for. Calling implies a certain threshold before something is considered a SNP and returns a level of confidence for each identified SNP. The sequences i am using are confirmed variants, i.e. I know that each variation is real. I would like to "simply" create a vcf file containing all differences between the files

ADD REPLYlink written 6.3 years ago by Whetting1.5k

Did you figure out a tool that does this? Also do you mean that any multiple sequence alignments using assembled sequences (assuming the assembly is correct) do not have to go through a "variant calling" approach? What about alignment errors?

ADD REPLYlink written 3.6 years ago by Felix Francis450

What format is your data in? We need more information to understand what you are trying to do. What and Why = Best answer.

ADD REPLYlink written 6.3 years ago by Zev.Kronenberg11k
2
gravatar for Zev.Kronenberg
6.3 years ago by
United States
Zev.Kronenberg11k wrote:

There isn't a program I am aware of that does what you want. However, here are the steps I would take:

  1. Inport your MSA (multiple sequence alignment) into a program that can output variant sites only. Paup* can do this.
  2. Output the matrix and map the gene position to the genomic position.
  3. Write a script that will convert these data to VCF format.
ADD COMMENTlink written 6.3 years ago by Zev.Kronenberg11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1058 users visited in the last hour