Question: Sort VCF File by Position?
0
gravatar for Niell
2.5 years ago by
Niell0
Niell0 wrote:

Previously, I split out a vcf file by chromosome, and for my project, I have combined the X and XY vcf files into a single one. After changing the "XY" chromosome designation to "X" via:

awk '{gsub(/"XY"/, "X"); print;}' Genome_newX.vcf > Genome_newX2.vcf

I'm running into the issue of sorting this new "Genome_newX2.vcf" by position. The idea is that I'll subsequently run the vcf through GenotypeHarmonizer.

Are there any suggestions on how to do this easily? I'm brand new to this style of work, and I'd love some direction on where to read up on it as well. Thank you!

chromosome vcf • 6.4k views
ADD COMMENTlink modified 2.5 years ago by ATpoint36k • written 2.5 years ago by Niell0
6
gravatar for ATpoint
2.5 years ago by
ATpoint36k
Germany
ATpoint36k wrote:

Use this:

cat in.vcf | awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' > out_sorted.vcf

It takes a VCF and prints the sorted file including the header.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by ATpoint36k

excellent, this solved the problem. I really appreciate!

ADD REPLYlink written 2.5 years ago by Niell0
1
sort -k1,1 -k2,2n

This works well in your case, as you seem to have just on chromosome. For sorting a vcf file I prefer this:

sort -k1,1V -k2,2n my.vcf

This makes sure that your chromosomes are sorted correctly. WIthout the 'V' "2" comes behind "19" for example.

fin simmer

ADD REPLYlink written 2.5 years ago by finswimmer13k

I do not recommend to use natural sorting on genomic data. Most other tools, e.g. samtools (for sorting bam files) do not support this by default. If you ever do operations like intersections with bedtools on two or more files that require files to be sorted, the different sort orders would/could cause conflict, e.g. bedtools intersect with the -sorted option

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by ATpoint36k
1

too bad vcf-sort is garbage and the -c flag doesnt work even with the newest version

ADD REPLYlink written 16 months ago by jon.klonowski70

Hello ATPoint,

funny. This is exact the same reason why I use natural sorting. :) The data I've worked with (human) was always sorted this way and I got problems it a part in the analyse pipeline wasn't.

fin swimmer

ADD REPLYlink written 2.5 years ago by finswimmer13k

Would there be an equivalent for a BCF? bcftools view | [...] code? Or why not using bcftools sort -Oz output.bcf -o output_sort.vcf.gz?

ADD REPLYlink modified 6 months ago • written 6 months ago by beausoleilmo320
1

bcftools sort is absolute the right way and the way I would go today :)

ADD REPLYlink written 6 months ago by finswimmer13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1891 users visited in the last hour