How to extract allel, Genotype from vcf file using python or other language for 23GB files? Well, I am able to right script to get allel but for large VCF file its difficult ? what should other possible way to get allel, Genotype information?
How to extract allel, Genotype from vcf file using python or other language for 23GB files? Well, I am able to right script to get allel but for large VCF file its difficult ? what should other possible way to get allel, Genotype information?
See bcftools query.
EDIT: WIth bcftools query
you can print any information you like. So in your case e.g.:
$ bcftools query -f '%CHROM %POS %REF %ALT [ %GT]\n' input.vcf
The output looks now like this:
chr1 10177 ACC ACCC 0/1
chr1 10327 T C 0/0
chr1 10352 TAC TAAC 1/1
chr1 12783 G A 1/1
fin swimmer
Hello Ram,
if an "answer" is just intended for full copy&paste solution then my post is indeed more a comment. But I thought that telling the tool with it's subcommand and linking to the good manual is an answer enough.
I extended my post now to an full answer :)
cpad was faster than me, right. I didn't saw his answer as I haven't reload the page.
fin swimmer
Extracting genotype information using R.
library(vcfR)
vcf <- read.vcfR(vcf_file, verbose = FALSE )
gt <- extract.gt(vcf, element = c('GT'), as.numeric = TRUE)
For python take a look at the following article.
http://alimanfoo.github.io/2017/06/14/read-vcf.html
Genotypes can also be extracted using SnpSift.jar
in snpEff using the following command.
java -jar ../snpEff/SnpSift.jar extractFields annotated.vcf CHROM POS REF ALT "GEN[*].GT" > output.tsv
Doesn't look like vcfR does streaming read, so I would not recommend it as it's not a great idea to build an in-memory object of an entire VCF file. A better strategy would be to use closer-to-bare-metal tools such as bcftools to extract information, then use R or Python to compute on extracted information.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
try
bcftools query
.how about VCFtools?
Why is this a
tool
post? A question about tools should be aquestion
-type post, not atool
-type post.What have you tried?
May help the user (AWK ideas):A: How to get sample names and genotype for SNP in multi-sample VCF fileActually, I have a Python script that can parse a VCF, in fact: Filtering VCF with python
Why have you replied to my comment, Kevin?
Did not want to create yet another 4th and independent comment
You can take a look at this two scripts wrote in python to split a vcf and select what you want : A: VCF file help and C: parsing vcf file