What's the easiest way to extract 2 or more columns simultaneously from my vcf SNP file and export them in CSV format from command line in linux? Any commands in vcftools or do I have to use perl?
What's the easiest way to extract 2 or more columns simultaneously from my vcf SNP file and export them in CSV format from command line in linux? Any commands in vcftools or do I have to use perl?
This linux command will output column 1,2, and 3 of your file into csv. Change the 1,2,3 according to the column (1-based) you want to extract. You can extract as many columns as you want.
cut -f 1,2,3 fileName.vcf | sed 's/[\t]/,/g' > cols.csv
I tried with the command but not sure why i m getting it wrong..!!
cut -f 2,6,9 file.vcf | sed 's/[\t]/,/g' > out.csv
sed: -e expression #1, char 1: unknown command: `\ufffd'
sed: -e expression #1, char 1: unknown command: `\ufffd'
By the way what else can I do for INFO column (column 9) if I want to get just DP from a list of things in that column 9 (INFO column contains e,g. AN=2;DP=87;Dels=0.00..)
Using Awk :
awk 'BEGIN {OFS ="," ; FS = "\t"};{print $1, $2, $4}' input.vcf > output.csv
In perl:
perl -F"\t" -lane '$, = ","; print $F[0], $F[1] , $F[3]' input.vcf > output.csv
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You may want to specify if you're trying to extract 2 or more samples or simply arbitrary columns. It looks like matted's answer applies to the former, and Dk and Rm's apply to the latter.