Question: Vcf-consensus does not replace the variants of the VCF in the output Fasta file
0
gravatar for evovaldo
2.9 years ago by
evovaldo0
evovaldo0 wrote:

Hello, I'm trying to get a consensus fasta file using a Reference.fa and a Vcf.gz.

To do that I first do:

To zip the VCF

bgzip -c file.vcf > file.vcf.gz

To tabix the VCF

tabix -p vcf file.vcf.gz

After I Use:

cat file.fa | vcf-consensus vcf.gz > out.fa

However the out.fa does not have the variants from the VCF and is the same as the reference. I'm searching in the forum and do what the people has answered such as:

1) Make sure the *vcf.gz file was zipped using bgzip. You could unzip them first and zip them again with bgzip from tabix

2) Make sure the chromosome IDs are same in VCF and fasta. Sometimes, it's "chr01" in one file, but "chr1" or "1" in another one.

My fasta header is:

">19"

My VCF is:

CHROM POS ID REF ALT

19 45393826 rs568341751 G C
19 45393827 rs530756795 T C
19 45393837 rs375258934 G A
19 45393877 rs570535613 G A

What could be wrong?????

I appreciate any help.

Best regards,

sequence software error genome • 1.3k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by evovaldo0

Just in case, there are not genotypes in your vcf ?

ADD REPLYlink written 2.9 years ago by microfuge1.3k
0
gravatar for evovaldo
2.9 years ago by
evovaldo0
evovaldo0 wrote:

This is part of the VCf file which I created using ensembl for a specific chr19 region. There is not genotypes on it.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG01191

19 45393826 rs568341751 G C 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=20113;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393827 rs530756795 T C 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=20026;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393837 rs375258934 G A 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=2;DP=19380;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393877 rs570535613 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=16770;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP GT 0|0

19 45393886 rs373618045 T C 100 PASS AA=.|||;AC=0;AF=0.000599042;AFR_AF=0;AMR_AF=0;AN=2;DP=16554;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0031;VT=SNP GT 0|0

19 45393958 rs558926735 C T 100 PASS AA=.|||;AC=0;AF=0.00179712;AFR_AF=0;AMR_AF=0;AN=2;DP=13554;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0092;VT=SNP GT 0|0

19 45393992 rs184809057 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=14726;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394006 rs535065384 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=16773;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394085 rs149731538 T G 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=2;DP=17596;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394111 rs575818345 C G 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=16243;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394125 rs544801238 G T 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=15936;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394126 rs116973259 G C 100 PASS AA=.|||;AC=0;AF=0.00159744;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=15888;EAS_AF=0.0069;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394159 rs190710529 C A 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0;AMR_AF=0;AN=2;DP=14841;EAS_AF=0.002;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

ADD COMMENTlink written 2.9 years ago by evovaldo0

Thanks! I vaguely remember that in the script the replacement of nucleotide of the reference needs the genotype. Probably as all the genotypes in your vcf seem to be 0|0 there is no replacement. Depending on what you want could you try replacing the 0/0 with a 1/1 or a 0/1 and see if the replacement happens ?

ADD REPLYlink written 2.9 years ago by microfuge1.3k
0
gravatar for evovaldo
2.9 years ago by
evovaldo0
evovaldo0 wrote:

I think was wrong. There is a genotype GT 0|0 for all variants.

ADD COMMENTlink written 2.9 years ago by evovaldo0
0
gravatar for evovaldo
2.9 years ago by
evovaldo0
evovaldo0 wrote:

Tanks to you.....I'll try to do that.

ADD COMMENTlink written 2.9 years ago by evovaldo0
0
gravatar for evovaldo
2.9 years ago by
evovaldo0
evovaldo0 wrote:

Hi microfuge, I tried to replace randomly some of the genotypes with 0/1 or 1/1 because I don't have GT:AD:DP:GQ:PL information probably because I obtained the vcf from the 1000 genomes data slice. However, no replacement is observed.

ADD COMMENTlink written 2.9 years ago by evovaldo0

So sorry that it did not work. If not too much time is wasted then can you try giving a -s <sample_name> which in your case should be -s HG01191. Also a vcf-validator run on your vcf. vcf-validator input.vcf.gz could indicate potential problems with the VCF. On an unrelated note, you seem to using 'add answer' rather than 'add comment' while replying.

ADD REPLYlink written 2.9 years ago by microfuge1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1973 users visited in the last hour