How to sort vcf by an INFO column score
2
1
Entering edit mode
20 months ago
juntkym ▴ 20

Hi there, Are there any good ways to sort a vcf file by a value in the INFO column (say INFO/CADD score)? Thanks

vcf • 1.6k views
ADD COMMENT
1
Entering edit mode
20 months ago
Jeremy ▴ 910

You can use the following:

bcftools query -f '%CHROM %POS %REF %ALT %INFO/CADD\n' yourFile.vcf | sort -k5 -g -r > new.vcf

Use the -f tag to indicate the columns that you want to keep, then sort the column containing the CADD score (k5 in this example). Note the -g tag to sort numerically and the -r tag to sort in descending order. You can alter the code to suit your needs.

ADD COMMENT
1
Entering edit mode

Thank you very much for your quick reply! Sorry for my lack of words but I was thinking that I wanted to keep all the other column values in the output and in the vcf format. Your answer inspired me to write down like below, but if there are any cleaner way, please let me know. Anyway, you helped me a lot. Thanks again!

cat <(cat myFile.vcf | grep ^#) \
    <(bcftools query -f '%INFO/CADD\t%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO\t%FORMAT\n' myFile.vcf | sort -k1 -g -r | cut -f2-)
ADD REPLY
0
Entering edit mode

Why do you want to sort your vcf? This seems like an unusual thing to do, you almost always want it sorted by genomic coordinate position.

ADD REPLY
0
Entering edit mode

Yes, I agree it’s unusual. The sorted vcf is to be both manually inspected by medical geneticists for identification of causal variants of rare diseases and my colleagues’ automatic variant interpretation program.

ADD REPLY
0
Entering edit mode

If someone else is looking at it, perhaps better to use bcftools query to pull out the relevant values and then sort those, rather than sorting the whole vcf. But IDK what your colleagues program does, so that may not work in it.

ADD REPLY
1
Entering edit mode
20 months ago

I wrote https://lindenb.github.io/jvarkit/SortVcfOnInfo.html

$ curl  "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
   java -jar dist/sortvcfoninfo.jar -F BaseQRankSum | grep -vE "^#" 

chr10   1142208 .   T   C   3404.30 .   AC=8;AF=1.00;AN=8;
chr10   135336656   .   G   A   38.34   .   AC=4;AF=1.00;AN=4;
chr10   52004315    .   T   C   40.11   .   AC=4;AF=1.00;AN=4;
chr10   52497529    .   G   C   33.61   .   AC=4;AF=1.00;AN=4;
chr10   126678092   .   G   A   89.08   .   AC=1;AF=0.13;AN=8;BaseQRankSum=-3.120;
chr16   72057435    .   C   T   572.98  .   AC=1;AF=0.13;AN=8;BaseQRankSum=-2.270;
chr10   48003992    .   C   T   1047.87 .   AC=4;AF=0.50;AN=8;BaseQRankSum=-0.053;
chr10   135210791   .   T   C   65.41   .   AC=4;AF=0.50;AN=8;BaseQRankSum=2.054;
chr10   135369532   .   T   C   122.62  .   AC=2;AF=0.25;AN=8;BaseQRankSum=2.118;
ADD COMMENT
0
Entering edit mode

Wow, this is exactly what I was looking for. Many thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6