Question: Remove variants from VCF by INFO tag
0
gravatar for bioroma.spb
4.8 years ago by
bioroma.spb50
Saint-Petersburg, Russian Empire
bioroma.spb50 wrote:

Hello everyone,

I have a whole folder of VCF's generated by GATK CombineVariants. I want to remove variants (entire rows) containing ":R" or ":F" (but not ":FR") strings in INFO column. What is the best way to do this?

awk gatk vcf • 1.6k views
ADD COMMENTlink modified 4.8 years ago by Matt Miossec350 • written 4.8 years ago by bioroma.spb50
3
gravatar for Matt Miossec
4.8 years ago by
Matt Miossec350
UK/Oxford/Wellcome Centre for human genetics
Matt Miossec350 wrote:

You could use a simple AWK command (assuming the INFO column is the 8th column):

`awk '$8!~/:R/ && $8!~/:F[^R]*$/` FILE.vcf > FILE_updated.vcf

Removes all lines with either :R or :F (unless :FR)

If you want to do it for all your VCF files:

for file in `ls *.vcf`; do **awk command above** $file > ${file%%.vcf}_updated.vcf ; done

Iterates through all the .vcf files in your current directory.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Matt Miossec350
1

Thank you! Problem solved.

ADD REPLYlink written 4.8 years ago by bioroma.spb50
1

UPD: I've encountered another problem: command you wrote leaves rows with :F at the end of the column. Do you have any suggestions why?

ADD REPLYlink written 4.8 years ago by bioroma.spb50
1

I've updated the AWK command to take that into account. The awk commands interprets '[^R]' as "any term that isn't R". So if ':F' is at the end of the field, it will not exclude it because it is expecting a term that isn't there. I have fixed this issue by writing '[^R]*$' instead. The asterix stands for "0 or more" and the dollar sign stands for the end of the field. It will therefore remove lines with ':F' if it's at the end of the field or otherwise anything that isn't ':FR'.

ADD REPLYlink written 4.8 years ago by Matt Miossec350
1

Thank you again! Now everything works well.

ADD REPLYlink written 4.8 years ago by bioroma.spb50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1180 users visited in the last hour
_