Masking Variable Sites in a Fasta File
Entering edit mode
9.0 years ago
Jautis ▴ 570

Hi, I have a fasta file representing a reference genome and I would like to modify it to mask variable sites when I map variable sites. I'm interested in doing this because I have bisulfite reads from several related species, but BSmap and Bismark don't offer an option to mask variable sites while mapping.

The initial genome is in a fasta file. The sites I would like masked in a vcf file.

Thank you!

fasta vcf • 2.6k views
Entering edit mode
9.0 years ago

If you can convert your VCF to BED format (see Converting a VCF with SNPs and indels to BED format) you can use the pyfaidx faidx command to mask your FASTA file with a special character, or as lowercase letters:

vcf2bed < variable_sites.vcf | faidx genome.fasta --bed - -m

Note that the -m and -M options will modify your FASTA file in-place, so you probably want to make a copy first.


Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6