Vcf file sorting
1
0
Entering edit mode
8 weeks ago
Lukas • 0

I got vcf file from my instructor. It is VEP annoted with over 50 options separated by ||. I noticed that the vcf is not arrange to appropriate columns so I decided to sort it.

I used this code to sort my vcf file according position:

$grep "^#" input.vcf > output.vcf$ grep -v "^#" input.vcf| sort -k1,1V -k2,2g >> output.vcf


However after I used it I expected to have output.vcf data sorted into columns. Instead all data of each variant data is still shift.

Am I doing something wrong? Is it different way to arrange vcf into columns?

Vcf • 442 views
0
Entering edit mode

Guys I am so sorry if this question is inappropriate but I started using linux half year ago.

2
Entering edit mode
8 weeks ago
Ram 34k

Rather than reinvent the wheel, use existing tools - bcftools sort, for example. Also, see this thread: https://bioinformatics.stackexchange.com/questions/6826/sort-vcf-by-contig-and-position-within-contig

Maybe -k2,2n would work better than -k2,2g in your solution.

0
Entering edit mode

i really dont know why is that but my vcf is only sorted, when i open it with nano. But if i used less, more or bcftools view it is still shifted.

0
Entering edit mode

Can you show us the exact commands you're using? Are you sure that the file sort order is weird in the less/more case and it's not happening because of display issues where the tabs don't always line up?

0
Entering edit mode

It didn't even cross my mind. I guess it maybe the case. However I thought that when I use sort it sort the information according to my liking and arrange it into columns when I pipe it into a new vcf file. So my reasoning was that if the vcf will be sorted properly I would get even samples GT into separate columns. But still it queried with bcftool query into one chunk of information of samples GT. I really appreciate an hits because I am stuck with my main goal to separate samples GT into columns.

1
Entering edit mode

bcftools query is your friend when you want a table of comma/tab delimited values from a VCF file. You may also want to look into adding a column command to your pipe so it's easier to eyeball. See this post: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists where I describe how to use column to make things easier to look at.

0
Entering edit mode