The Logic Behind The Naming System In 1000Genomes Ftp Arrangements
data/NA18603/sequence_read/ERR000103.filt.fastq.gz


now im trying to figure out:
1) are they all from the same human ?
2) why does the NA18* number change ? 3) why there are 3 versions of each ERR000* file (I thought matched were 2 (paired reads))

They are not all from the same person. The NA18* number is the ID number for the individual being sequenced. There are 3 sequence files, one for PE1, one for PE2, and the final for reads where at least one of the paired ends didn't pass QC. The QC protocol, as well as a lot of other information on 1000 genomes sequencing data (including most of what I've told you here) can be found [?]here[?]. In the past, when I've had questions about 1000 genomes sequencing info, I've found their [?]FTP site[?] to be a great resource.

the sequence alignments i've run before were grouped by chromosome. So then these simply have the entire 23 chromosomes in a file each ?

Yeah, should be. I've never checked the alignment files from 1K genomes, but generally, alignment output files are for all aligned regions. For 1K genomes, that means all autosomes, sex chromosomes, mitochondrial chromosome, and non-chromosomal supercontigs. The link to the 1K genome project's description of the alignment protocol is here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/README.human_g1k_v37.fasta.txt

To be clear, that's all aligned regions in the one alignment output file...

