Uploading to dbSNP, need 25nt upstream and downstream of variants
2
0
Entering edit mode
9.2 years ago
apelin20 ▴ 480

Hello,

So I have a VCF file I want to upload to dbSNP, however the genome should be released at the same time as SNPs, which means I need to provide 25nt upstream and downstream for every variant.

Does anyone have an easy R script to create this columns for every VCF file? Given Contig names, position of variants and a fasta file.

Adrian

vcf dbSNP • 1.7k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

Well, you can just flank() think in R and use getSeq() with a BSgenome. Having said that, this is likely faster with bedtools (flank followed by getfasta).

ADD COMMENT
0
Entering edit mode
9.2 years ago

Perhaps convert to BED and pad:

$ bedops --everything --range 25 <(vcf2bed < variants.vcf) < padded_variants.bed

Then convert to FASTA with bed2fasta.pl or similar.

ADD COMMENT

Login before adding your answer.

Traffic: 2143 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6