Question: Extracting Columns From Vcf File Using Vcftools/Perl
2
gravatar for bioinfo
6.9 years ago by
bioinfo720
New Zealand
bioinfo720 wrote:

What's the easiest way to extract 2 or more columns simultaneously from my vcf SNP file and export them in CSV format from command line in linux? Any commands in vcftools or do I have to use perl?

perl vcftools snp • 11k views
ADD COMMENTlink written 6.9 years ago by bioinfo720

You may want to specify if you're trying to extract 2 or more samples or simply arbitrary columns. It looks like matted's answer applies to the former, and Dk and Rm's apply to the latter.

ADD REPLYlink written 6.9 years ago by dfornika1000
8
gravatar for Damian Kao
6.9 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

This linux command will output column 1,2, and 3 of your file into csv. Change the 1,2,3 according to the column (1-based) you want to extract. You can extract as many columns as you want.

cut -f 1,2,3 fileName.vcf | sed 's/[\t]/,/g' > cols.csv
ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Damian Kao15k

I tried with the command but not sure why i m getting it wrong..!!

cut -f 2,6,9 file.vcf | sed 's/[\t]/,/g' > out.csv
sed: -e expression #1, char 1: unknown command: `\ufffd'

sed: -e expression #1, char 1: unknown command: `\ufffd'

By the way what else can I do for INFO column (column 9) if I want to get just DP from a list of things in that column 9 (INFO column contains e,g. AN=2;DP=87;Dels=0.00..)

ADD REPLYlink written 6.9 years ago by bioinfo720
4
gravatar for matted
6.9 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

For completeness, here's a vcftools solution:

vcf-subset -c name1,name3,name5 in.vcf.gz | tr "\t" "," > out.csv
ADD COMMENTlink written 6.9 years ago by matted7.0k
vcf-subset -c POS,QUAL my.vcf.gz | tr "\t" "," > out.csv

Its giving me just the header line of the two columns in the VCF file POS and QUAL

#CHROM,POS,ID,REF,ALT,QUAL,FILTER,INFO,FORMAT,POS,QUAL

by the way I don't want any other info except the data in the columns with header

ADD REPLYlink modified 5.8 years ago by Neilfws48k • written 6.9 years ago by bioinfo720
1
gravatar for Rm
6.9 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:

Using Awk :

awk 'BEGIN {OFS ="," ; FS = "\t"};{print $1, $2, $4}' input.vcf > output.csv

In perl:

perl -F"\t" -lane '$, = ","; print $F[0], $F[1] , $F[3]' input.vcf > output.csv
ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Rm7.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 983 users visited in the last hour