Question: Inaccurate/Undesired results from bcftools version 1.1 convert --tsv2vcf command
1
gravatar for Taylor Griswold
4.8 years ago by
United States
Taylor Griswold10 wrote:

Hey,

I am working with a tab separated file of SNPs (mummer output) and want to convert it into a variant call format (VCF). I am currently using bcftools version 1.1 to execute this with the subcommand convert --tsv2vcf. Upon execution, there are no errors and a header of a VCF file and statistics about the conversion are displayed correctly. I am not getting any content in the VCF output though; ideally the content should be displayed for every line provided in the tab separated file. The content or "rows" are being skipping (output below). What am I doing wrong and how can I fix this to included each and every line in the initial tsv file? There is no indication as to why the rows are skipped.

Below is the command I executed, the output upon exection, and a portion of the initial text file. Any help would be appreciated.

Thanks, Taylor

 

Input File (TSV):

C       4875    scaffold5-3     .
C       12221   scaffold5-3     .
G       17413   scaffold5-3     .
C       17422   scaffold5-3     .

Command Used: 

bcftools convert -c AA,POS,CHROM,ID  -f ../OAntigen_NAg_3528-08.fasta --tsv2vcf tempFile.txt  -O v -s OAntigen_3566-08_v2.fasta

Output Example:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=3528-08_OAntigen_prev_NODE_11_&_NODE_49_Jul_18,length=103905>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  OAntigen_3566-08_v2.fasta
Rows total:     377
Rows skipped:   377
Missing GTs:    0
Hom RR:         0
Het RA:         83
Hom AA:         0
Het AA:         294
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 4.8 years ago by Taylor Griswold10
1

Hey Taylor...were you able to sort this out?..i'm facing the same problem for conversion of 23andMe files.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by marzia.rizvi10
0
gravatar for Lee Katz
4.8 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

I'm hoping that there is a bcftools answer from someone but I made a script that might be helpful to anyone who might need to do this too.

https://github.com/lskatz/lskScripts/blob/master/mummerToVcf.pl

ADD COMMENTlink written 4.8 years ago by Lee Katz3.0k
0
gravatar for liangjiao.xue
3.1 years ago by
liangjiao.xue100
United States
liangjiao.xue100 wrote:

This is one late response. I think it is necessary because I spent hours to resolve this problem.
Originally, I thought this is a very easy case to convert from MUMmer/snps to VCF. However, it not that easy to get the correct solution.
Some traps:
1) You need to check the reference sequence to rebuild insertion and deletion.
Instead of reading original reference fasta file, I used "show-snps -x 1", so that the surrounding nucleotides are also reported.
2) For the insertions, if the query sequences are reversely mapped to the references, the orders of nucleotides in query sequence are reversely reported.
So, they needed to be concatenated in reverse order.
3) The coordinates of insertion and deletions.
For insertions, the coordinates in MUMmer/snps are the coordinates of nucleotides before insertions. They need to be kept as the same in VCF files.
For deletions, the coordinates in MUMmer/snps are of the nucleotides that are deleted. The coordinates in VCF should be : first_position_of_deletion_block - 1.

Here is my python code to fix the problems.
https://github.com/liangjiaoxue/PythonNGSTools/blob/master/MUMmerSNPs2VCF.py

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by liangjiao.xue100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1895 users visited in the last hour