Masking Variable Sites in a Fasta File
Entering edit mode
6.4 years ago
Jautis ▴ 330

Hi, I have a fasta file representing a reference genome and I would like to modify it to mask variable sites when I map variable sites. I'm interested in doing this because I have bisulfite reads from several related species, but BSmap and Bismark don't offer an option to mask variable sites while mapping. 

The initial genome is in a fasta file. The sites I would like masked in a vcf file. 


Thank you!

fasta vcf variable • 1.9k views
Entering edit mode
6.4 years ago

If you can convert your VCF to BED format (see Converting a VCF with SNPs and indels to BED format ) you can use the pyfaidx "faidx" command to mask your FASTA file with a special character, or as lowercase letters:

vcf2bed < variable_sites.vcf | faidx genome.fasta --bed - -m

Note that the -m and -M options will modify your FASTA file in-place, so you probably want to make a copy first.


Login before adding your answer.

Traffic: 1752 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6