I've been handed a collection of pileup files, which as far as I can tell, have been created using the deprecated pileup
command in samtools via Galaxy's 'generate pileup' tool. These are not the 10-column files with the SNPs called, which I need. Is there a way of converting them back into a more useful format, such as VCF? My ultimate goal is to calculate Fst, using snpStats, SNP Pipeline, or some other toolkit.
I have tried sam2vcf.pl
, however it returns a cryptic error message, perhaps owing to the old filetype:
$ sam2vcf.pl -r reference.fasta < Galaxy7-\[P2_pileup\].tabular > test.vcf
FIXME: what is this [N]?
at /usr/bin/sam2vcf.pl line 41, <STDIN> line 1408.
main::error("FIXME: what is this [N]?\x{a}") called at /usr/bin/sam2vcf.pl line 89
main::iupac_to_gtype("G", "N") called at /usr/bin/sam2vcf.pl line 214
main::do_pileup_to_vcf(HASH(0x6dc398)) called at /usr/bin/sam2vcf.pl line 32
The person who did the processing is apparently unavailable and I don't have access to the original raw data. Am I stuck until someone can rerun all the processing?
I have a hard time considering VCF to be superior to anything... it's the worst text-based format I've encountered in bioinformatics.
Superior in that it appears to work for my purposes. The files I have seem to be unusable for calling SNPs. I'm more than a bit green at this, however.