Question: Sort VCF File by Position?
0
gravatar for Niell
19 months ago by
Niell0
Niell0 wrote:

Previously, I split out a vcf file by chromosome, and for my project, I have combined the X and XY vcf files into a single one. After changing the "XY" chromosome designation to "X" via:

awk '{gsub(/"XY"/, "X"); print;}' Genome_newX.vcf > Genome_newX2.vcf

I'm running into the issue of sorting this new "Genome_newX2.vcf" by position. The idea is that I'll subsequently run the vcf through GenotypeHarmonizer.

Are there any suggestions on how to do this easily? I'm brand new to this style of work, and I'd love some direction on where to read up on it as well. Thank you!

chromosome vcf • 2.4k views
ADD COMMENTlink modified 19 months ago by ATpoint23k • written 19 months ago by Niell0
5
gravatar for ATpoint
19 months ago by
ATpoint23k
Germany
ATpoint23k wrote:

Use this:

cat in.vcf | awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' > out_sorted.vcf

It takes a VCF and prints the sorted file including the header.

ADD COMMENTlink modified 19 months ago • written 19 months ago by ATpoint23k

excellent, this solved the problem. I really appreciate!

ADD REPLYlink written 19 months ago by Niell0
1
sort -k1,1 -k2,2n

This works well in your case, as you seem to have just on chromosome. For sorting a vcf file I prefer this:

sort -k1,1V -k2,2n my.vcf

This makes sure that your chromosomes are sorted correctly. WIthout the 'V' "2" comes behind "19" for example.

fin simmer

ADD REPLYlink written 19 months ago by finswimmer12k

I do not recommend to use natural sorting on genomic data. Most other tools, e.g. samtools (for sorting bam files) do not support this by default. If you ever do operations like intersections with bedtools on two or more files that require files to be sorted, the different sort orders would/could cause conflict, e.g. bedtools intersect with the -sorted option

ADD REPLYlink modified 19 months ago • written 19 months ago by ATpoint23k

Hello ATPoint,

funny. This is exact the same reason why I use natural sorting. :) The data I've worked with (human) was always sorted this way and I got problems it a part in the analyse pipeline wasn't.

fin swimmer

ADD REPLYlink written 19 months ago by finswimmer12k

too bad vcf-sort is garbage and the -c flag doesnt work even with the newest version

ADD REPLYlink written 5 months ago by jon.klonowski30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 776 users visited in the last hour