Question

Where To Find Huge Sample Files? (Phylip, Nexus, Clustalw, Etc)

0

Entering edit mode

12.1 years ago

andreagarcia871 ▴ 60

I'm developing a parser for interleaved file formats, like Phylip, ClustalW and Nexus. But a challenge is to validate my parser against big samples, and do some stress testing. There are some nice short examples : PHYLIP format, Nexus DNA format. Anyone mind to share URL's to download .phy,. nxs, .nex files with many samples/sequences?

clustalw • 5.2k views

ADD COMMENT • link updated 5.5 years ago by Biostar 20 • written 12.1 years ago by andreagarcia871 ▴ 60

0

Entering edit mode

Could you be a bit more clear on what kind of data you are looking to parse? Sounds like you already have large sequence data files in FASTA, etc., format, so I'm not exactly sure what you are looking for.

ADD REPLY • link 12.1 years ago by Josh Herr 5.8k

0

Entering edit mode

Edited with links to some sample files of what I'm looking for. I just want the same files but with many more samples.

ADD REPLY • link 12.1 years ago by andreagarcia871 ▴ 60

score 2 · Answer 1 · 2012-10-09

Thanks for the clarifications on your question. Just a quick google gave me a few open source options for large nexus files. If you're looking for a LARGE nexus file, this is the largest one I am aware of. Scroll to the bottom of the paper for a link to a tarball of the dataset. Here's another large nexus dataset linked the supplementary data of this paper. Here's another paper with a large nexus file. This paper has a large nexus file in the supplementary data link. I don't see too many people use the PHYLIP format anymore, but I'm sure there are some options out there. I hope these links help you.

score 1 · Answer 2 · 2013-04-22

I am sure you found answers to this months ago. But in case others come here looking for answers I will make an addition to this. The TreeBase (database for storing phylogenetic trees and data sets) has thousands of data sets, and some of them are large. One example, is a data set of some 32 kb of nuclear and mitochondiral genes from each of over 100 different species of birds. Another is a similar data set for primates, with 8 megabases from 186 species of primates.