Question: sorting a multi-sample (genotype) vcf file
1
gravatar for nagarsaggi
5 days ago by
nagarsaggi10
nagarsaggi10 wrote:

I have a freebayes genotyped multisample vcf file. I want to sort the names of the samples in alphabetical order to make my life a bit easy with post variant calling analysis. I have tried Picared SortVcf which work fine which works fine on a small file but failed on a large file (~4 Gb). If you suggest ways to sort a large multi-sample file without distorting the variants information, it would a great help.

snp • 58 views
ADD COMMENTlink written 5 days ago by nagarsaggi10
4
gravatar for finswimmer
5 days ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hey, try this:

$ bcftools query -l input.vcf | sort > samples.txt
$ bcftools view -S samples.txt input.vcf > output.vcf

If not already done, I would also suggest to use bcf instead of vcf or vcf.gz. This really improves speed when working with bcftools on large datasets.

fin swimmer

ADD COMMENTlink written 5 days ago by finswimmer13k

It worked perfectly! Thanks

ADD REPLYlink written 3 days ago by nagarsaggi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 927 users visited in the last hour