Question: Vcf Locations In Consensus Sequence
1
gravatar for Nupur G
7.3 years ago by
Nupur G30
Nupur G30 wrote:

I have a VCF file created by running GATK on read files against a reference genome. The variants in the VCF file have 'locations', these are the locations on the reference genome. Sample lines include

NC_002516.2 92915 . T A 1941.76 PASS AC=2;AF... GT:AD:DP:GQ:PL 1/1:0,80:81:99:1975,240,0 NC_002516.2 192617 . GA G 2562.66 PASS AC=2;AF=... GT:AD:DP:GQ:PL 1/1:0,64:64:99:2605,193,0

I also have a consensus sequence created by vcftools. Which starts off as -

">NC_002516.2 TTTAAAGAGACCGGCGATTCTAGTGAAATCGAACGGGCAGGTCAATTTCCAACCAGCGAT"

What I need though is the variant location on the consensus sequence. So if, from the VCF file, '92915' is the first variant, then this is the location on the reference as well as on the consensus. However, subsequently there are indels. Which will shift the location on the consensus forward and backward. So I need a tool to calculate the variant location on the consensus.

(And then I will need to get annotation data for that region.) Any idea how this can be done please- getting variant consensus locations?

Actually VCFtools is also giving an error, I need to find another utility to create the consensus sequence.... Much appreciated

consensus vcf next-gen variant • 3.0k views
ADD COMMENTlink modified 6.0 years ago by Jorge Amigo11k • written 7.3 years ago by Nupur G30
1

You mean because of insertions or deletions, the other two comments look like they were thinking about SNPs only, correct? Only alleles of different length can cause deviations in the start location, if no such are contained, then it doesn't matter. If you want to calculate the shifts by inserts, you need to first determine, which allele was chosen for the consensus sequence, reference or non-reference, if non-ref, then shift each location right-of-this by length(noon-ref)-length(ref). I would make a little R script for that.

ADD REPLYlink written 7.3 years ago by Michael Dondrup46k

Thanks for the input. Yes, I think I will write my own script. I just thought there might be a way to do this by an existing tool, as it is possibly a common enough task.

ADD REPLYlink written 7.3 years ago by Nupur G30

Could be but I don't know any, sorry.

ADD REPLYlink written 7.3 years ago by Michael Dondrup46k

Its not clear to me what you are looking for....can you edit your question to include a sample inputs and expected output you are looking...

ADD REPLYlink written 7.3 years ago by Rm7.9k

Presumably, the consensus sequence is simply the most likely nucleotide at each location of the reference. In that case, your variants will be locations where the consensus does not match the reference. The locations given in the VCF file will match the locations in the consensus since the consensus. If this does not make sense, please edit your question to provide more detail.

ADD REPLYlink written 7.3 years ago by Sean Davis25k

You're right - but this is true only for SNPs. Indels will cause shifting.

ADD REPLYlink written 7.3 years ago by Nupur G30
0
gravatar for Jorge Amigo
6.0 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

this question remains unanswered, but creating a fasta sequence from a vcf variants file has already been covered in several places like New Fasta Sequence From Reference Fasta And Variant Calls File? or Introducing Known Mutations (From A Vcf) Into A Fasta File.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour