How to parse info field in Excel
2
0
Entering edit mode
8.0 years ago
kulvait ▴ 270

Hi, I have my vcf file and sending it as raw to my colleagues. They open it in Excel and they specifically need to extract one field called AB from INFO field. My INFO fileld looks like this

DP=623;AB=0.058;AO=36;RO=472;QR=10544;QA=561;SRF=472;SRR=0;SAF=36;SAR=0;TYPE=snp;CIGAR=1M1X1M;LEN=1;MQM=60;MQMR=60;PARSER=FREEBAYES

Could you please help me to find Excel function to extract particular info from that to separate cell. Thanks.

sequencing • 2.2k views
ADD COMMENT
1
Entering edit mode

Although this seems obvious, if you open the file specifying ; as field separator, you'll visualize each INFO field in separated columns and it could be easy to select only the AB column. Is this what you are looking for?

ADD REPLY
0
Entering edit mode

Well this will probably do the trick. The problem might be that if there is for example DP record missing then it is not guaranteed that it will be in the same column for all records.

ADD REPLY
6
Entering edit mode

I can't resist citing that tweet:

ADD REPLY
0
Entering edit mode

I did have known it is a provocative question! The answer to my question seems to be here http://stackoverflow.com/questions/21674222/extract-characters-after-certain-other-characters-excel

ADD REPLY
0
Entering edit mode

If you are colleagues are Biologists with no sense of command line programming you can actually make a table fetching the concerned field with the proper awk command since it is a tab delimited file and send them the metrics.

ADD REPLY
1
Entering edit mode
8.0 years ago

If that's the field they're interested in, just use a script to extract that field and send it as a tab-delimited file. Perhaps something like:

cut -f 1-3,8 myvcf | perl -a -F'\t' -ne 'if($F[3] =~ /AB=([^;]+)/){print join("\t",(@F[0..2],$1)) . "\n"' >outfile.tsv
ADD COMMENT
0
Entering edit mode
8.0 years ago
Sinji ★ 3.2k

Assuming the AB field is always in the same position in all your rows you can use awk to do this easily.

$ awk -F';' -v OFS='\t' '{print $2}' INPUTFILE > OUTPUTFILE
ADD COMMENT
1
Entering edit mode

Sounds like a dangerous assumption to me :) VCFs are often a mess.

ADD REPLY
0
Entering edit mode

Fair enough, unfortunately with only one line as an example I thought i'd offer a solution anyway.

ADD REPLY

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6