I am converting pileup to vcf using sam2vcf. The tools says I need the reference sequence when indels are present. Whilst downloading the reference dna sequence in fasta format from ensembl the readme states the dna is available in these formats
* 'dna' - unmasked genomic DNA sequences. * 'dna_rm' - masked genomic DNA. Interspersed repeats and low complexity regions are detected with the RepeatMasker tool and masked by replacing repeats with 'N's.
Which should i use to make the conversion more reliable or doesn't it matter? I believe mpileup has replaced pileup and that the indel predictions with pileup are more reliable but I don't have the original data to rerun the analysis