Create short fasta sequences from a vcf file.
1
0
Entering edit mode
2.0 years ago
Kyle ▴ 10

Hi everyone,

I have a list of variants (SNPs; within a vcf file) that I'm trying to design allele specific PCR primers for (using WASP-https://bioinfo.biotec.or.th/WASP).

The input required for WASP is a fasta file. The problem is that my reference sequence (GRCm38) is massive, so when I've created the fasta file using bcftools:

cat GRCm38_68.fa | vcf-consensus vcf_file.vcf.gz > out.fa

The output file is 2.8GB and the sequences are of entire chromosomes.

Is there an any way to get say the nearest 20 bps (both up and downstream) from each variant which can be converted to the fasta format.

Kind Regards,
Kyle

bcftools WASP • 552 views
ADD COMMENT
1
Entering edit mode
2.0 years ago

quick solution, index out.fa with samtools faidx out.fa extract the sub-fasta with samtools faidx out.fa "chr1:234-456"

ADD COMMENT
0
Entering edit mode

Yep that works - thank you Pierre!

I'm working on getting the co-ordinates of the variants via:

bcftools query -f '%CHROM %POS' vcf_file.vcf.gz

and then passing the co-ordinates +/- 25 bps to samtools

ADD REPLY

Login before adding your answer.

Traffic: 1401 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6