Question: Masking Variable Sites in a Fasta File
0
gravatar for Jautis
4.0 years ago by
Jautis270
United States
Jautis270 wrote:

Hi, I have a fasta file representing a reference genome and I would like to modify it to mask variable sites when I map variable sites. I'm interested in doing this because I have bisulfite reads from several related species, but BSmap and Bismark don't offer an option to mask variable sites while mapping. 

The initial genome is in a fasta file. The sites I would like masked in a vcf file. 

 

Thank you!

variable vcf fasta • 1.3k views
ADD COMMENTlink modified 4.0 years ago by Matt Shirley9.0k • written 4.0 years ago by Jautis270
0
gravatar for Matt Shirley
4.0 years ago by
Matt Shirley9.0k
Cambridge, MA
Matt Shirley9.0k wrote:

If you can convert your VCF to BED format (see https://www.biostars.org/p/106249/) you can use the pyfaidx "faidx" command to mask your FASTA file with a special character, or as lowercase letters:

vcf2bed < variable_sites.vcf | faidx genome.fasta --bed - -m

Note that the "-m" and "-M" options will modify your FASTA file in-place, so you probably want to make a copy first.

ADD COMMENTlink written 4.0 years ago by Matt Shirley9.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 605 users visited in the last hour