From .txt to .vcf
2
0
Entering edit mode
2.0 years ago
am29 ▴ 30

Hi,

I have a .txt file that contains only chromosomes and base positions. I have to convert it to "real" .vcf file with ref and alt allele and all the other columns so it could be suitable for other analyses. Concretely for GATK's Select Variants. Does anyone know how to do this? Is this possible?

The file looks like this: Chr1:12345678

annotation vcf • 593 views
ADD COMMENT
1
Entering edit mode
2.0 years ago

generate a VCF, without ALT.

awk -F ':' 'BEGIN{printf("##fileformat=VCFv4.2\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n");} {printf("%s\t%s\t.\tN\t.\t.\t.\t.\n",$1,$2);}' your.file.txt> tmp.vcf

bcftools reheader -f ref.fasta.fai tmp.vcf | bcftools norm --fasta-ref ref.fasta --check-ref e > new.vcf
ADD COMMENT
0
Entering edit mode
2.0 years ago

You can create a VCF file in a programmatic way with libraries like

https://brentp.github.io/cyvcf2/writing.html

or

https://pyvcf.readthedocs.io/en/latest/index.html

but it would be a fairly complex task in my opinion.

A workaround could be to simulate reads from a mutated genome that has your variants applied, then call variants on that.

ADD COMMENT

Login before adding your answer.

Traffic: 2046 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6