Question: convert .csv file to a .vcf file
0
gravatar for Jamie Watson
8 weeks ago by
Jamie Watson0 wrote:

I have csv files from the following link:

https://human.genome.dating/download/index

However, they use hg37 human coordinates and I am using hg38. So I need someway to lift over these csv files to h38 coordinates. I was thinking of converting these csv files to vcf files and then use Picard tool to do liftover:

https://gatk.broadinstitute.org/hc/en-us/articles/360037060932-LiftoverVcf-Picard-

However, this tool only takes vcf files as input. So what I can do here? Insights will be appreciated.

snp genome • 222 views
ADD COMMENTlink modified 8 weeks ago by Pierre Lindenbaum133k • written 8 weeks ago by Jamie Watson0
0
gravatar for Pierre Lindenbaum
8 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

something like this ?

wget -O - -q "https://human.genome.dating/bulk/atlas.chr1.csv.gz" |\
gunzip  -c |\
tr -d ' ' |\
awk -F, '/^#/ {next;} /^VariantID/ {printf("##fileformat=VCFv4.2\n");split($0,header);for(i=5;i<=NF;i++) printf("##INFO=<ID=%s,Number=1,Type=String,Description=\"%s\">\n",$i,$i);printf("#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n");next ;} {printf("%s\t%s\t%s\t%s\t%s\t.\t.\t",$2,$3,$1,$5,$6);for(i=5;i<=NF;i++) printf("%s=%s;",header[i],$i);printf("\n");}'
ADD COMMENTlink written 8 weeks ago by Pierre Lindenbaum133k

I ran the command but it gives me the following error:

gzip: stdin: unexpected end of file

ADD REPLYlink written 8 weeks ago by Jamie Watson0

remove the -q option from wget ? are your working behind a proxy ?

ADD REPLYlink written 8 weeks ago by Pierre Lindenbaum133k

Hi again, Yes I made it work. However, when I try to liftOver using GATK Picard, it says 'The provided VCF file is malformed at approximately line number 26: 10539 is not a valid start position in the VCF format, for input source'. Could you explain why I am getting this error? I have used the same tool to liftover before and it worked.

ADD REPLYlink written 8 weeks ago by Jamie Watson0

works on my machine. Did you forgot to add the tr -d ' '

ADD REPLYlink written 8 weeks ago by Pierre Lindenbaum133k

Yes I didn't add that. I'll try again with tr -d ' '. Thanks.

ADD REPLYlink written 8 weeks ago by Jamie Watson0

The script still doesn't work. It gives the following error:

Exception in thread "main" java.lang.IllegalStateException: Key found in VariantContext field INFO at 1:10539 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

And looking at the awk command you do initialise INFO header so why is GATK Picard tool raising this exception?

ADD REPLYlink written 8 weeks ago by Jamie Watson0

So the error says that a key is missing. And that key apparently is a white space but INFO column does not have any blank spaces. I am not sure what is wrong with awk script.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Jamie Watson0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1617 users visited in the last hour
_