Question: sorting a multi-sample (genotype) vcf file
gravatar for nagarsaggi
9 months ago by
nagarsaggi10 wrote:

I have a freebayes genotyped multisample vcf file. I want to sort the names of the samples in alphabetical order to make my life a bit easy with post variant calling analysis. I have tried Picared SortVcf which work fine which works fine on a small file but failed on a large file (~4 Gb). If you suggest ways to sort a large multi-sample file without distorting the variants information, it would a great help.

snp • 636 views
ADD COMMENTlink written 9 months ago by nagarsaggi10
gravatar for finswimmer
9 months ago by
finswimmer13k wrote:

Hey, try this:

$ bcftools query -l input.vcf | sort > samples.txt
$ bcftools view -S samples.txt input.vcf > output.vcf

If not already done, I would also suggest to use bcf instead of vcf or vcf.gz. This really improves speed when working with bcftools on large datasets.

fin swimmer

ADD COMMENTlink written 9 months ago by finswimmer13k

It worked perfectly! Thanks

ADD REPLYlink written 9 months ago by nagarsaggi10

I spent a little bit too much time trying to figure out how to do this just to come here and find this simple solution. Thanks!

ADD REPLYlink written 3 months ago by curious430
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 871 users visited in the last hour