How to select information from long annotation and put into columns?
0
0
Entering edit mode
3.4 years ago
markgodek ▴ 50

I've got a file that looks like this:

[1]CHROM    [2]POS  [3]REF  [4]ALT  [5]AF   [6]GNOMAD_AF    [7]FUNCOTATION  [8]DP   [9]AC   [10]AN  [11]HITS622847:GT   [12]HITS622847:AD   [13]HITS622849:GT   [14]HITS622849:AD   [15]HITS622851:GT   [16]HITS622851:AD   [17]HITS622853:GT   [18]HITS622853:AD   [19]HITS622855:GT   [20]HITS622855:AD   [21]HITS622856:GT   [22]HITS622856:AD   [23]HITS622858:GT   [24]HITS622858:AD   [25]HITS622860:GT   [26]HITS622860:AD   [27]HITS622862:GT   [28]HITS622862:AD   [29]HITS622864:GT   [30]HITS622864:AD   [31]HITS622866:GT   [32]HITS622866:AD   [33]HITS622868:GT   [34]HITS622868:AD   [35]HITS622870:GT   [36]HITS622870:AD   [37]HITS622872:GT   [38]HITS622872:AD   [39]HITS622875:GT   [40]HITS622875:AD   [41]HITS622877:GT   [42]HITS622877:AD
1   715348  T   G   1   1   [RP11-206L10.2|hg19|chr1|715348|715348|FIVE_PRIME_FLANK||SNP|T|T|G|g.chr1:715348T>G|ENST00000428504.1|-||||||0.4937655860349127|GTGGAACCCTTTCTCTACAAA||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||true|false|0_%2C_1|false|false|0|false|false|false|LOC100288069:100288069|true|false|false|true|true|false|false|false|false|false|false|false|false|false|false|false|false|false|true|false|3131984|715348|true|false|0|true|0|false|0.000137372_%2C_0.999863|false|false|false|SNV|true|0x050100020005040136000100|1|false|103|rs3131984|]    8   4   4   ./. 1,0 ./. 0,0 ./. 1,0 ./. 1,0 ./. 1,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0

I was using

awk 'BEGIN { OFS = "\t" } { gsub("\\|.*\\||\\[|\\]" ,"", $7); print $0 }'

to cut the Funcotation down to "RP11-206L10.2" but I realized I also need to pull out the 6th value in each variant as well but my regex powers just aren't there yet. Basically, trimming leading and trailing brackets, and keeping only the string before the first bar and the string between the 5th and 6th bars.

Any help in getting the entries like that into the format below is appreciated.

1   715348  T   G   1   1   RP11-206L10.2   FIVE_PRIME_FLANK   8   4   4   ./. 1,0 ./. 0,0 ./. 1,0 ./. 1,0 ./. 1,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 ./. 0,0 1/1 0,2 ./. 0,0 ./. 0,0
SNP VCF FUNCOTATION • 642 views
ADD COMMENT

Login before adding your answer.

Traffic: 2822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6