How to select information from long annotation and put into columns?
0
0
Entering edit mode
4 months ago
markgodek ▴ 40

I've got a file that looks like this:

[1]CHROM    [2]POS  [3]REF  [4]ALT  [5]AF   [6]GNOMAD_AF    [7]FUNCOTATION  [8]DP   [9]AC   [10]AN  [11]HITS622847:GT   [12]HITS622847:AD   [13]HITS622849:GT   [14]HITS622849:AD   [15]HITS622851:GT   [16]HITS622851:AD   [17]HITS622853:GT   [18]HITS622853:AD   [19]HITS622855:GT   [20]HITS622855:AD   [21]HITS622856:GT   [22]HITS622856:AD   [23]HITS622858:GT   [24]HITS622858:AD   [25]HITS622860:GT   [26]HITS622860:AD   [27]HITS622862:GT   [28]HITS622862:AD   [29]HITS622864:GT   [30]HITS622864:AD   [31]HITS622866:GT   [32]HITS622866:AD   [33]HITS622868:GT   [34]HITS622868:AD   [35]HITS622870:GT   [36]HITS622870:AD   [37]HITS622872:GT   [38]HITS622872:AD   [39]HITS622875:GT   [40]HITS622875:AD   [41]HITS622877:GT   [42]HITS622877:AD
1   715348  T   G   1   1   [RP11-206L10.2|hg19|chr1|715348|715348|FIVE_PRIME_FLANK||SNP|T|T|G|g.chr1:715348T>G|ENST00000428504.1|-||||||0.4937655860349127|GTGGAACCCTTTCTCTACAAA||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||true|false|0_%2C_1|false|false|0|false|false|false|LOC100288069:100288069|true|false|false|true|true|false|false|false|false|false|false|false|false|false|false|false|false|false|true|false|3131984|715348|true|false|0|true|0|false|0.000137372_%2C_0.999863|false|false|false|SNV|true|0x050100020005040136000100|1|false|103|rs3131984|]    8   4   4   ./. 1,0 ./. 0,0 ./. 1,0 ./. 1,0 ./. 1,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0

I was using

awk 'BEGIN { OFS = "\t" } { gsub("\\|.*\\||\\[|\\]" ,"", $7); print $0 }'

to cut the Funcotation down to "RP11-206L10.2" but I realized I also need to pull out the 6th value in each variant as well but my regex powers just aren't there yet. Basically, trimming leading and trailing brackets, and keeping only the string before the first bar and the string between the 5th and 6th bars.

Any help in getting the entries like that into the format below is appreciated.

1   715348  T   G   1   1   RP11-206L10.2   FIVE_PRIME_FLANK   8   4   4   ./. 1,0 ./. 0,0 ./. 1,0 ./. 1,0 ./. 1,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0
SNP VCF FUNCOTATION • 200 views
ADD COMMENT

Login before adding your answer.

Traffic: 1069 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6