How to filter a vcf file based on genotype of first two samples
3
0
Entering edit mode
7 months ago
kmkdesilva ▴ 90

Hi everyone,

I have this vcf file which I want to filter and generate a second vcf file. I want to filter based on the genotype of fist two samples.

I am looking for variant where sample1 is 0/0 when sample2 is 1/1 and sample1 is 1/1 when sample2 is 0/0.

Following is how the rows of the vcf file looks like. Both sample1 and sample2 are 0/0 in the first example variant.

chr1    376878  .       A       C       17830.7 .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       0/0:16,0:16:45:.:.:0,45,675:.   0/0:13,0:13:39:.:.:0,3
9,504:. 1/1:0,46:46:99:.:.:1855,138,0:. 1/1:0,46:46:99:.:.:1785,138,0:. 1/1:0,63:63:99:.:.:1876,189,0:. 1|1:0,61:61:99:1|1:376878_A_C:2423,183,0:37687
8       1|1:0,19:19:66:1|1:376878_A_C:914,66,0:376878   1/1:0,9:9:27:.:.:335,27,0:.     1/1:0,26:26:78:.:.:801,78,0:.   1/1:0,61:61:99:.:.:2812,202,0:
.       1|1:0,24:24:72:1|1:376878_A_C:1052,72,0:376878
chr1    376881  .       T       C       6133.93 .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       0/0:16,0:16:45:.:.:0,45,675:.   0/0:14,0:14:42:.:.:0,4
2,538:. 0/0:36,0:36:99:.:.:0,99,1453:.  0/0:39,0:39:99:.:.:0,108,1620:. 0/0:53,0:53:99:.:.:0,105,1575:. 1|1:0,47:47:99:1|1:376878_A_C:2314,171,0:37687
8       1|1:0,22:22:69:1|1:376878_A_C:951,69,0:376878   0/0:9,0:9:24:.:.:0,24,360:.     0/0:25,0:25:66:.:.:0,66,990:.   1/1:0,67:67:99:.:.:1941,204,0:
.       1|1:0,22:22:66:1|1:376878_A_C:990,66,0:376878

Please tell me if anyone knows how to do this. Thank you

vcf genotype filter • 848 views
ADD COMMENT
4
Entering edit mode
7 months ago

You can include any desired combination of conditions using the appropriate bcftools' expression. In your case, this combination that looks for refref+altalt or altalt+refref on first and second sample respectively would do the job:

bcftools view -i 'GT[0]="RR" && GT[1]="AA" || GT[0]="AA" && GT[1]="RR"' test.vcf

GT[0] and GT[1] refer to the first and the second sample respectively. You can change this if needed.

ADD COMMENT
0
Entering edit mode

Thank you. But my vcf file is not compressed with bgzip. Is there a similar command in vcftools that I can use?

ADD REPLY
0
Entering edit mode

I tested this command with an uncompressed test.vcf file and it worked flawlessly.

ADD REPLY
0
Entering edit mode

You are right. It worked fine. I compressed the file using bcftools and then your command. It worked well too. Thank you again. bcftools view file.vcf -Oz -o file.vcf.gz bcftools index file.vcf.gz

Can you please suggest a tutorial where I can learn usage of basic bioinfo tools like this?

ADD REPLY
0
Entering edit mode

I'm afraid I haven't come across any tutorial to learn about bcftools. All my knowledge, as with most of other NGS software, comes from solving my daily needs by inspecting its documentation and good old trial & error.

ADD REPLY
2
Entering edit mode
7 months ago

A simple perl one-liner would do:

perl -lane 'print if $F[0] =~ /^#/
or $F[9] =~ /^0.0/ and $F[10] =~ /^1.1/
or $F[9] =~ /^1.1/ and $F[10] =~ /^0.0/
' file.vcf

$F[9] and $F[10] are the columns of the first and second samples respectively. You can change this if needed, plus you can remove the $F[0] =~ /^#/ if headers are unneeded in the output.

ADD COMMENT
0
Entering edit mode

Thanks. Can't I use bcftools or vcftools for this?

ADD REPLY
1
Entering edit mode

You can indeed. I'll post it as a new answer, as I really think that should be the way to go.

ADD REPLY
1
Entering edit mode
7 months ago

using http://lindenb.github.io/jvarkit/VcfFilterJdk.html :

java -jar dist/vcffilterjdk.jar -e 'return (variant.getGenotype(0).isHomVar() && variant.getGenotype(1).isHomRef())  ||  (variant.getGenotype(1).isHomVar() && variant.getGenotype(0).isHomRef());' input.vcf
ADD COMMENT
0
Entering edit mode

Thanks. I am working on hpc. I got following error while trying to compile vcffilterjdk.jar. Hos do I fix it?

ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.

Please set the JAVA_HOME variable in your environment to match the location of your Java installation.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6