How to filter a vcf file based on genotype of first two samples
3
0
Entering edit mode
6 weeks ago
kmkdesilva ▴ 90

Hi everyone,

I have this vcf file which I want to filter and generate a second vcf file. I want to filter based on the genotype of fist two samples.

I am looking for variant where sample1 is 0/0 when sample2 is 1/1 and sample1 is 1/1 when sample2 is 0/0.

Following is how the rows of the vcf file looks like. Both sample1 and sample2 are 0/0 in the first example variant.

chr1    376878  .       A       C       17830.7 .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       0/0:16,0:16:45:.:.:0,45,675:.   0/0:13,0:13:39:.:.:0,3
9,504:. 1/1:0,46:46:99:.:.:1855,138,0:. 1/1:0,46:46:99:.:.:1785,138,0:. 1/1:0,63:63:99:.:.:1876,189,0:. 1|1:0,61:61:99:1|1:376878_A_C:2423,183,0:37687
8       1|1:0,19:19:66:1|1:376878_A_C:914,66,0:376878   1/1:0,9:9:27:.:.:335,27,0:.     1/1:0,26:26:78:.:.:801,78,0:.   1/1:0,61:61:99:.:.:2812,202,0:
.       1|1:0,24:24:72:1|1:376878_A_C:1052,72,0:376878
chr1    376881  .       T       C       6133.93 .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       0/0:16,0:16:45:.:.:0,45,675:.   0/0:14,0:14:42:.:.:0,4
2,538:. 0/0:36,0:36:99:.:.:0,99,1453:.  0/0:39,0:39:99:.:.:0,108,1620:. 0/0:53,0:53:99:.:.:0,105,1575:. 1|1:0,47:47:99:1|1:376878_A_C:2314,171,0:37687
8       1|1:0,22:22:69:1|1:376878_A_C:951,69,0:376878   0/0:9,0:9:24:.:.:0,24,360:.     0/0:25,0:25:66:.:.:0,66,990:.   1/1:0,67:67:99:.:.:1941,204,0:
.       1|1:0,22:22:66:1|1:376878_A_C:990,66,0:376878


Please tell me if anyone knows how to do this. Thank you

vcf genotype filter • 524 views
2
Entering edit mode
5 weeks ago

You can include any desired combination of conditions using the appropriate bcftools' expression. In your case, this combination that looks for refref+altalt or altalt+refref on first and second sample respectively would do the job:

bcftools view -i 'GT[0]="RR" && GT[1]="AA" || GT[0]="AA" && GT[1]="RR"' test.vcf


GT[0] and GT[1] refer to the first and the second sample respectively. You can change this if needed.

0
Entering edit mode

Thank you. But my vcf file is not compressed with bgzip. Is there a similar command in vcftools that I can use?

0
Entering edit mode

I tested this command with an uncompressed test.vcf file and it worked flawlessly.

0
Entering edit mode

You are right. It worked fine. I compressed the file using bcftools and then your command. It worked well too. Thank you again. bcftools view file.vcf -Oz -o file.vcf.gz bcftools index file.vcf.gz

Can you please suggest a tutorial where I can learn usage of basic bioinfo tools like this?

0
Entering edit mode

I'm afraid I haven't come across any tutorial to learn about bcftools. All my knowledge, as with most of other NGS software, comes from solving my daily needs by inspecting its documentation and good old trial & error.

0
Entering edit mode
6 weeks ago
java -jar dist/vcffilterjdk.jar -e 'return (variant.getGenotype(0).isHomVar() && variant.getGenotype(1).isHomRef())  ||  (variant.getGenotype(1).isHomVar() && variant.getGenotype(0).isHomRef());' input.vcf

0
Entering edit mode

Thanks. I am working on hpc. I got following error while trying to compile vcffilterjdk.jar. Hos do I fix it?

ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.

0
Entering edit mode
0
Entering edit mode
6 weeks ago

A simple perl one-liner would do:

perl -lane 'print if $F[0] =~ /^#/ or$F[9] =~ /^0.0/ and $F[10] =~ /^1.1/ or$F[9] =~ /^1.1/ and $F[10] =~ /^0.0/ ' file.vcf  $F[9] and $F[10] are the columns of the first and second samples respectively. You can change this if needed, plus you can remove the $F[0] =~ /^#/ if headers are unneeded in the output.

0
Entering edit mode

Thanks. Can't I use bcftools or vcftools for this?

1
Entering edit mode

You can indeed. I'll post it as a new answer, as I really think that should be the way to go.