Difficulty finding useful info from Genome in a Bottle
0
0
Entering edit mode
2.2 years ago
ManuelDB ▴ 80

I would like to evaluate and compare a few NGS pipelines I have developed for learning proposes.

I have found that Genome ina bottle is a good source to do that. However, I found it difficult to find exactly what I want when surfing for the Index of /giab/ftp/data/NA12878.

What I need is:

  1. Fastq files or even better the bam file of long regions of the genome
  2. The VCF file with the high confident variants Following ones are optional but these would help me a lot...
  3. The bed file used to target the search (I can do this but it this is given, better)
  4. For my BaseRecalibrator step I think (not sure) I should use the same they follow GATK best pratises.
  5. The pipeline used to generate the results.

I think that what I need is this (here is the link)

Index of /giab/ftp/data/NA12878/Nebraska_NA12878_HG001_TruSeq_Exome

Name                                    Last modified      Size  
Parent Directory                                            -   
NIST-hg001-7001-b-ensemble.vcf         2014-09-09 16:30   34M  
NIST-hg001-7001-b-freebayes.vcf        2014-09-09 16:30   60M  
NIST-hg001-7001-b-gatk-haplotype.vcf   2014-09-09 16:30   22M  
NIST-hg001-7001-b-gatk.vcf             2014-09-09 16:30   24M  
NIST-hg001-7001-b-ready.bam            2014-09-10 18:21  8.6G  
NIST-hg001-7001-b-ready.bam.bai        2015-09-25 16:36  6.6M  
NIST-hg001-7001-ensemble.vcf           2014-09-09 16:30   34M  
NIST-hg001-7001-freebayes.vcf          2014-09-09 16:31   67M  
NIST-hg001-7001-gatk-haplotype.vcf     2014-09-09 16:31   21M  
NIST-hg001-7001-gatk.vcf               2014-09-09 16:31   24M  
NIST-hg001-7001-ready.bam              2014-09-10 18:38   10G  
NIST-hg001-7001-ready.bam.bai          2015-09-25 16:35  6.9M  
TruSeq_exome_targeted_regions.hg19.bed 2015-07-17 10:39   12M  
m5sum                                  2014-09-11 12:01  641   

This NIST-hg001-7001-b-ready.bam could be the bam file I need

And this NIST-hg001-7001-b-gatk-haplotype.vcf could be for example the vcf I want

And the TruSeq_exome_targeted_regions.hg19.bed the bed target!

Question:

Amount the many versions and things found in the main parent directory is this what I am looking for or there is something more appropriate.? Read.md files explain some quite well some aspects but lack in details of what is. each thing. For example, I haven't found what is the difference between Nebraska NA12378 and NIST_NA12878.

It is me right? I am missing some important documentation somewhere.

Anyway, I am looking for some confirmation/help to get the right data before starting any analysis and wasting my time if the files are the wrong ones!

Thanks in advance!

Manuel

GIAB • 578 views
ADD COMMENT
0
Entering edit mode

Perhaps you are not looking in the right location. How about this page?

Several additional links on their main page.

ADD REPLY

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6