How to extract allel, Genotype from vcf file using python or other language for 23GB files? Well, I am able to right script to get allel but for large VCF file its difficult ? what should other possible way to get allel, Genotype information?
Extracting genotype information using R.
library(vcfR) vcf <- read.vcfR(vcf_file, verbose = FALSE ) gt <- extract.gt(vcf, element = c('GT'), as.numeric = TRUE)
For python take a look at the following article.
Genotypes can also be extracted using
SnpSift.jar in snpEff using the following command.
java -jar ../snpEff/SnpSift.jar extractFields annotated.vcf CHROM POS REF ALT "GEN[*].GT" > output.tsv
See bcftools query.
bcftools query you can print any information you like. So in your case e.g.:
$ bcftools query -f '%CHROM %POS %REF %ALT [ %GT]\n' input.vcf
The output looks now like this:
chr1 10177 ACC ACCC 0/1 chr1 10327 T C 0/0 chr1 10352 TAC TAAC 1/1 chr1 12783 G A 1/1