Parse VCF file
3
0
Entering edit mode
9.2 years ago
aj123 ▴ 120

Hello,

I'm trying to parse a VCF file and filter it based on 1000Gp1_AF < 0.05. Any ideas on how to go about this? Thank you!

sequencing VCF Perl Python Linux • 8.4k views
ADD COMMENT
0
Entering edit mode

You may wanna be more careful with the terminology. Parsing is pretty specific process and you're only looking to filter VCF. You've elicited what you wish to do very clearly, so you're good at stating requirements, but the tech community can be a bit touchy about jargon.

ADD REPLY
1
Entering edit mode
9.2 years ago

It's almost always better to use existing tools than to write parser on your own. Take a look here at section ALLELE FILTERING.

ADD COMMENT
0
Entering edit mode

I think OP didn't actually meant parse as in semantic parsing. OP might have used it as a synonym for processing.

ADD REPLY
0
Entering edit mode
9.2 years ago
Tariq Daouda ▴ 220

Hi,

I wrote a python parser that you can find here.

Here is the doc on how to use it.

Cheers

ADD COMMENT
0
Entering edit mode
9.2 years ago

Using my tool https://github.com/lindenb/jvarkit/wiki/VCFFilterJS:

$ curl -s "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz" | gunzip -c | java -jar ~/src/jvarkit-git/dist/vcffilterjs.jar -e 'variant.hasAttribute("AF") && variant.getAlternateAlleles().size()==1 && variant.getAttributeAsDouble("AF",1.0)<0.05' 
(...)
1    10235    .    T    TA    100    PASS    AA=|||unknown(NO_COVERAGE);AC=6;AF=0.00119808;AFR_AF=0;AMR_AF=0.0014;AN=5008;DP=78015;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0051
1    10505    .    A    T    100    PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=9632;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0
1    10506    .    C    G    100    PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=9676;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0
1    10511    .    G    A    100    PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0.0014;AN=5008;DP=9869;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0
1    10539    .    C    A    100    PASS    AA=.|||;AC=3;AF=0.000599042;AFR_AF=0;AMR_AF=0.0014;AN=5008;DP=9203;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0.001
1    10542    .    C    T    100    PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=5008;DP=9007;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0
1    10579    .    C    A    100    PASS    AA=.|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=5502;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0
1    10642    .    G    A    100    PASS    AA=.|||;AC=21;AF=0.00419329;AFR_AF=0.0129;AMR_AF=0.0014;AN=5008;DP=1360;EAS_AF=0.003;EUR_AF=0;NS=2504;SAS_AF=0
1    11063    .    T    G    100    PASS    AA=.|||;AC=15;AF=0.00299521;AFR_AF=0.0106;AMR_AF=0.0014;AN=5008;DP=2834;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0
1    13011    .    T    G    100    PASS    AA=t|||;AC=3;AF=0.000599042;AFR_AF=0.0023;AMR_AF=0;AN=5008;DP=35822;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0
ADD COMMENT
0
Entering edit mode

Why biallelic only?

ADD REPLY
2
Entering edit mode

Yes, I'm lazy. The real script would look like:

function accept(v)
{
if(!v.hasAttribute("AF")) return false;
if(v.getAlternateAlleles().size()==1) return v.getAttributeAsDouble("AF",1.0)<0.05
for(var af: v.getAttribute("AF")) if(af < 0.05) return true;
return false;
}

accept(variant)
ADD REPLY
1
Entering edit mode

Well, you're working for free so you can afford to be lazy :)

ADD REPLY
0
Entering edit mode

Thank you very much! Could you please explain this so that I can understand it and use this as a template to write my own script? also, is this in python?

ADD REPLY
0
Entering edit mode

As described on the doc page, my program is a java program. It takes as input a javascript program and inject in this script a variable called 'variant' which is an instance of https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/VariantContext.html. The small problem here is that there can be more than one AF attribute if there is more than one ALT allele.

ADD REPLY
0
Entering edit mode

Thanks Pierre! Will take a look and post my script up sometime to check.

ADD REPLY
0
Entering edit mode

Why re-invent the wheel/write your own script? And why attempt to, when you have difficulties differentiating between JS and Python? Believe me, you're better off using existing tools.

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6