I would like to evaluate and compare a few NGS pipelines I have developed for learning proposes.
I have found that Genome ina bottle is a good source to do that. However, I found it difficult to find exactly what I want when surfing for the Index of /giab/ftp/data/NA12878.
What I need is:
- Fastq files or even better the bam file of long regions of the genome
- The VCF file with the high confident variants Following ones are optional but these would help me a lot...
- The bed file used to target the search (I can do this but it this is given, better)
- For my BaseRecalibrator step I think (not sure) I should use the same they follow GATK best pratises.
- The pipeline used to generate the results.
I think that what I need is this (here is the link)
Index of /giab/ftp/data/NA12878/Nebraska_NA12878_HG001_TruSeq_Exome
Name Last modified Size
Parent Directory -
NIST-hg001-7001-b-ensemble.vcf 2014-09-09 16:30 34M
NIST-hg001-7001-b-freebayes.vcf 2014-09-09 16:30 60M
NIST-hg001-7001-b-gatk-haplotype.vcf 2014-09-09 16:30 22M
NIST-hg001-7001-b-gatk.vcf 2014-09-09 16:30 24M
NIST-hg001-7001-b-ready.bam 2014-09-10 18:21 8.6G
NIST-hg001-7001-b-ready.bam.bai 2015-09-25 16:36 6.6M
NIST-hg001-7001-ensemble.vcf 2014-09-09 16:30 34M
NIST-hg001-7001-freebayes.vcf 2014-09-09 16:31 67M
NIST-hg001-7001-gatk-haplotype.vcf 2014-09-09 16:31 21M
NIST-hg001-7001-gatk.vcf 2014-09-09 16:31 24M
NIST-hg001-7001-ready.bam 2014-09-10 18:38 10G
NIST-hg001-7001-ready.bam.bai 2015-09-25 16:35 6.9M
TruSeq_exome_targeted_regions.hg19.bed 2015-07-17 10:39 12M
m5sum 2014-09-11 12:01 641
This NIST-hg001-7001-b-ready.bam could be the bam file I need
And this NIST-hg001-7001-b-gatk-haplotype.vcf could be for example the vcf I want
And the TruSeq_exome_targeted_regions.hg19.bed the bed target!
Question:
Amount the many versions and things found in the main parent directory is this what I am looking for or there is something more appropriate.? Read.md files explain some quite well some aspects but lack in details of what is. each thing. For example, I haven't found what is the difference between Nebraska NA12378 and NIST_NA12878.
It is me right? I am missing some important documentation somewhere.
Anyway, I am looking for some confirmation/help to get the right data before starting any analysis and wasting my time if the files are the wrong ones!
Thanks in advance!
Manuel
Perhaps you are not looking in the right location. How about this page?
Several additional links on their main page.