Question: Masked Or Unmasked Dna Reference For Sam2Vcf.Pl
gravatar for Pi
9.7 years ago by
Pi520 wrote:


I am converting pileup to vcf using sam2vcf. The tools says I need the reference sequence when indels are present. Whilst downloading the reference dna sequence in fasta format from ensembl the readme states the dna is available in these formats

 * 'dna' - unmasked genomic DNA sequences.
  * 'dna_rm' - masked genomic DNA.  Interspersed repeats and low
     complexity regions are detected with the RepeatMasker tool and masked
     by replacing repeats with 'N's.

Which should i use to make the conversion more reliable or doesn't it matter? I believe mpileup has replaced pileup and that the indel predictions with pileup are more reliable but I don't have the original data to rerun the analysis


vcf samtools pileup • 5.0k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 9.7 years ago by Pi520
gravatar for Adam
9.7 years ago by
United States
Adam1.0k wrote:

You should use the unmasked reference. If you use the masked reference, then an indel in a repetitive region will have an 'N' as a reference base in the VCF, which (while not strictly invalid) is probably not what you want.

ADD COMMENTlink written 9.7 years ago by Adam1.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2445 users visited in the last hour