convert tsv file to VCF
0
0
Entering edit mode
15 months ago
Eliza ▴ 30

Hi,

I have a tsv file from the CADD website of SNPs with the CADD score. The file looks like this:

##CADD GRCh38-v1.6 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health 2013-2020. All rights reserved.
#Chrom  Pos Ref Alt RawScore    PHRED
1   13116   T   G   -0.184119   0.553
1   13118   A   G   0.249405    3.697
1   16682   G   A   1.498900    15.73
1   900161  C   G   0.250372    3.709
1   902288  G   A   0.154766    2.625
1   980460  G   A   0.378618    5.188
1   1362903 G   C   -0.072717   0.945
1   1414714 A   G   0.595507    7.469
1   1420704 C   T   0.533685    6.852
1   1560103 C   T   0.003631    1.358
1   1600156 C   G   1.424234    15.24
1   1608229 C   T   -0.138069   0.691
1   1648140 G   C   -0.003037   1.316
1   1650001 C   T   -0.366057   0.226
1   1650007 C   T   0.049118    1.673
1   1666342 A   G   0.351431    4.881
1   1670036 C   A   -0.453113   0.149
1   1846582 A   G   0.237802    3.561
1   1848109 G   C   0.045210    1.644
1   1854321 A   G   0.213451    3.278
1   1870210 G   A   1.248068    14.00
1   1888369 A   G   0.445696    5.927
1   1902466 A   C   0.261213    3.836
1   1902566 G   A   -0.009076   1.280

I would like to convert it to a VCF file. I tried this code in UBUNTU:

 awk -F "\t" '{print "CHROM"$1"\t"$POS"\t"$REF"\t"$ALT"\t"$RawScore"\t"$PHRED"}' GRCh38-v1.6_1e1bfdf83583b30a108d7c9b6ad51134.tsv > df_1_50k.

But it didn't produce the file in the correct format. I would be happy to know where is my mistake and how to fix it. Thank you:)

CADD vcf • 1.7k views
ADD COMMENT
0
Entering edit mode

Please do not paste screenshots of plain text content, it is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY
0
Entering edit mode

Look into bcftools convert --tsv2vcf. You'll need to explore, do a bunch of trials and tweaks but you should be able to do better than awk. It also looks like you don't understand the VCF format, please read the VCF spec.

ADD REPLY
0
Entering edit mode

I know that there should be other columns such as ID, QUAL, and INFO .... but this is the file that the CADD website returned also unfortunately the bcftools don't work on my PC :(

ADD REPLY
1
Entering edit mode

bcftools don't work on my PC

Find out why and fix it; get it working. bcftools is very well tested and will address failure scenarios that you can't think of.

I know that there should be other columns such as ID, QUAL, and INFO .... but this is the file that the CADD website returned

If you need these fields with legitimate values downstream, you cannot generate a VCF file. If you just need those fields, you can create fake values. ID can be . all over, you can skip QUAL (I think) and INFO can have a single entry (called COMMENT, say, with some text that adds information on why it exists). It might be possible to skip INFO altogether. Explore more.

ADD REPLY
0
Entering edit mode

I needed to convert a TSV to a VCF recently and made a post about it. I also used AWK instead of bcftools convert for conversion but confirmed the output VCF was correctly formatted with bcftools like Ram suggests.

ADD REPLY

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6