Extracting variants from dbNSFP
2
0
Entering edit mode
3.5 years ago
NGSCanBioinf ▴ 10

Hello, Is there any ways to extract specific variants from a dbNSFP file by providing the chromosome location at the command line?

annotation • 1.3k views
ADD COMMENT
1
Entering edit mode
3.5 years ago

tabix .

ADD COMMENT
1
Entering edit mode
3.5 years ago

Just to elaborate a little bit Pierre's answer...

I agree that tabix would be the fastest way to do it, but considering that dbNSFP comes in a single zip file containing chromosome gzipped (not bgzipped) files, the best way to do it would be to unzip the particular chromosome you're interested in, bgzip it, tabix index it, and then query it with tabix.

Say you're interested in position 21:5011803, then you should go for something like this:

unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz \
| gunzip | bgzip > dbNSFP4.1a_variant.chr21.gz
tabix -b2 -e2 -S1 dbNSFP4.1a_variant.chr21.gz
tabix dbNSFP4.1a_variant.chr21.gz 21:5011803-5011803

You could even go for a simple grep if you don't want to generate any intermediate files:

unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz | zcat | head -1 > result.tab
unzip -p dbNSFP4.1a.zip dbNSFP4.1a_variant.chr21.gz | zgrep -P "^21\t5011803\t" >> result.tab
ADD COMMENT
0
Entering edit mode

Thank you very much, this is useful!

ADD REPLY

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6