EDIT: Since the data I'm looking for isn't available, my new question is if it's possible to concatenate together the sequence pieces from a fasta file that lists pieces of the sequence? How do I interpret what each part of the query template name means in the fasta file? I assume one of the number at the end refers to chromosome and other refer to start/end positions relative to the entire genome. If I know the start/end positions, I can order the pieces together, noting the gaps in between. For instance, for individual Sid1253, this is a query template name and sequence associated with it:
OLD POST: I'm looking to download several (3 to 6) Neanderthal genomes which have been mapped to a human reference genome. The file format should be fasta. I've checked the Neanderthal Genome Project and found several bam files, which I converted to fasta. These are the links to them: ftp://ftp.ebi.ac.uk/pub/databases/ensembl/neandertal/BAM_files/ http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/bam/
However, the fasta files list each individual's genome as snippets (from my interpretation; I've only begun to work with fasta formats). I think that those pieces can be concatenated together to give entire chromosome sequences, but I'm not sure how to do that. So I'm looking for the entire, long genome. More specifically, I'm looking for the chromosome-level sequence for each Neanderthal individual, where regions that haven’t been sequenced are masked as N's.
My questions are: 1. Where can I find this data? 2. If this data isn't available in the desired format, is it possible to concatenate together the sequence pieces from those links? How do I interpret what each part of the query template name means in the fasta file?