Conversion of old pileup file to more useful format
1
0
Entering edit mode
8.8 years ago
mpjuers • 0

I've been handed a collection of pileup files, which as far as I can tell, have been created using the deprecated pileup command in samtools via Galaxy's 'generate pileup' tool. These are not the 10-column files with the SNPs called, which I need. Is there a way of converting them back into a more useful format, such as VCF? My ultimate goal is to calculate Fst, using snpStats, SNP Pipeline, or some other toolkit.

I have tried sam2vcf.pl, however it returns a cryptic error message, perhaps owing to the old filetype:

    $  sam2vcf.pl -r reference.fasta < Galaxy7-\[P2_pileup\].tabular > test.vcf
        FIXME: what is this [N]?
        at /usr/bin/sam2vcf.pl line 41, <STDIN> line 1408.
            main::error("FIXME: what is this [N]?\x{a}") called at /usr/bin/sam2vcf.pl line 89
            main::iupac_to_gtype("G", "N") called at /usr/bin/sam2vcf.pl line 214
            main::do_pileup_to_vcf(HASH(0x6dc398)) called at /usr/bin/sam2vcf.pl line 32

The person who did the processing is apparently unavailable and I don't have access to the original raw data. Am I stuck until someone can rerun all the processing?

galaxy samtools • 2.1k views
ADD COMMENT
1
Entering edit mode

I have a hard time considering VCF to be superior to anything... it's the worst text-based format I've encountered in bioinformatics.

ADD REPLY
0
Entering edit mode

Superior in that it appears to work for my purposes. The files I have seem to be unusable for calling SNPs. I'm more than a bit green at this, however.

ADD REPLY
1
Entering edit mode
8.8 years ago
Fabio Marroni ★ 3.0k

I suggest to try varscan which takes pileup as input and do SNP calling:

http://varscan.sourceforge.net/using-varscan.html

Another option (of which I am no expert, I just googled it) is to convert the pileup to vcf using the pileup2vcf script in galaxy

https://searchcode.com/codesearch/view/63202730/

ADD COMMENT
0
Entering edit mode

I must have missed those. Thanks.

ADD REPLY
0
Entering edit mode

After a fruitless struggle with those tools, it appears that the person who handed this dataset off to me also used `filter pileup` and appended the extra 5 columns (number of base differences) to the original 6-column output. I am continuing to suspect there isn't much I can do with what has been given to me in its overprocessed state. I ran VarScan on a sample 10-column consensus file I had and it worked fine.

ADD REPLY

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6