How to remove non chromosome (1-22) from a .VCF file
1
0
Entering edit mode
2.1 years ago
jmukisa90 ▴ 30

Hi there, I am performing the GATK pipeline for converting fastq files to VCF format. How do I remove, non-chromosome 1-22 variants from the column of my VCF file obtained after the GATK variant recalibration step? The VCF snap shot is attached here. Any help is highly appreciated.

    GL000192.1      547218  .       C       T       93.26   PASS    AC=1;AF=0.011;AN=90;BaseQRankSum=-7.510e-01;DP=241;ExcessHet=3.0103;FS=0.000;GQ_MEAN=85.00;InbreedingCoeff=-0.0535;MLEAC=1;MLEAF=0.011;MQ=56.22;MQRankSum=0.589;NCC=0;NEGATIVE_TRAIN_SITE;QD=10.36;ReadPosRankSum=0.210;SOR=0.368;VQSLOD=-1.994e+00;culprit=MQRankSum     GT:AD:DP:GQ:PL  0/0:6,0:6:18:0,18,155   0/1:4,5:9:85:109,0,85   0/0:8,0:8:24:0,24,189   0/0:8,0:8:24:0,24,201   0/0:9,0:9:21:0,21,315   0/0:8,0:8:24:0,24,213        0/0:9,0:9:24:0,24,360   0/0:7,0:7:21:0,21,183   0/0:4,0:4:12:0,12,108   0/0:8,0:8:24:0,24,229   0/0:3,0:3:9:0,9,66      0/0:9,0:9:27:0,27,253        0/0:4,0:4:12:0,12,111   0/0:1,0:1:3:0,3,27      0/0:7,0:7:18:0,18,270   0/0:3,0:3:9:0,9,78      0/0:4,0:4:12:0,12,87    0/0:15,0:15:42:0,42,630      0/0:5,0:5:15:0,15,135   0/0:8,0:8:24:0,24,198   0/0:4,0:4:12:0,12,116   0/0:6,0:6:0:0,0,109     0/0:5,0:5:12:0,12,180   0/0:6,0:6:18:0,18,160   0/0:5,0:5:15:0,15,133        0/0:2,0:2:6:0,6,49      0/0:4,0:4:12:0,12,110   ./.:0,0:0:.:0,0,0       0/0:4,0:4:12:0,12,103   0/0:3,0:3:9:0,9,69      0/0:5,0:5:15:0,15,135        ./.:0,0:0:.:0,0,0       0/0:4,0:4:12:0,12,99    0/0:4,0:4:12:0,12,123   0/0:5,0:5:15:0,15,142   0/0:5,0:5:15:0,15,122   0/0:4,0:4:12:0,12,1130/0:2,0:2:6:0,6,49      0/0:4,0:4:12:0,12,124   0/0:3,0:3:9:0,9,84      0/0:6,0:6:18:0,18,150   0/0:8,0:8:18:0,18,270   0/0:1,0:1:3:0,3,29      0/0:3,0:3:9:0,9,66   0/0:4,0:4:12:0,12,105   0/0:5,0:5:15:0,15,114   0/0:4,0:4:0:0,0,49
GL000192.1      547235  .       C       G       244.16  PASS    AC=3;AF=0.034;AN=88;BaseQRankSum=-6.740e-01;DP=216;ExcessHet=0.0792;FS=0.000;GQ_MEAN=24.00;InbreedingCoeff=0.2294;MLEAC=3;MLEAF=0.034;MQ=54.98;MQRankSum=1.15;NCC=0;QD=20.35;ReadPosRankSum=-1.150e+00;SOR=0.495;VQSLOD=-2.486e+00;culprit=MQRankSum GT:AD:DP:GQ:PL       0/0:4,0:4:12:0,12,104   0/0:8,0:8:24:0,24,214   0/0:8,0:8:24:0,24,189   1/1:0,8:8:24:210,24,0   0/0:8,0:8:24:0,24,251   0/0:7,0:7:18:0,18,2700/0:7,0:7:18:0,18,270   0/0:7,0:7:21:0,21,183   0/0:4,0:4:0:0,0,49      0/0:7,0:7:21:0,21,209   0/0:2,0:2:6:0,6,46      0/0:10,0:10:30:0,30,282 0/0:5,0:5:15:0,15,134        ./.:0,0:0:.:0,0,0       0/0:7,0:7:21:0,21,217   0/0:3,0:3:9:0,9,78      0/0:4,0:4:12:0,12,86    0/0:13,0:13:39:0,39,371 0/0:5,0:5:15:0,15,148
Chromosome removal • 740 views
ADD COMMENT
3
Entering edit mode
2.1 years ago
bcftools view --regions-file bed_file_containing_the_only_chromosomes_you_want.bed indexed.vcf.gz

or

bcftools view in.vcf | awk -F '\t' '($0 ~ /^#/ || $1 ~ /^(chr)?[0-9XY]+$/)'

or

bcftools view indexed.vcf.gz `cut -f 1 /path/to/ref.fa.fai | grep -E '^(chr)?[0-9XY]+$'`

or

....

ADD COMMENT
0
Entering edit mode

Thanks, Pierre, your code sorted me out well. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6