Where can I find somatic whole-genome or exome FASTQ files (from tumor samples) with validated variants and corresponding VCFs publicly available?
1
0
Entering edit mode
4 months ago

I'm testing my somatic variant calling pipeline and I'm looking at Cancer Genome in a Bottle (GIAB) data. I found FASTQ files from the HG008-T sample (a pancreatic ductal adenocarcinoma), but they were generated using Hi-C sequencing:

HG008-T_HiC_PhaseGenomics_20241211_R1.fastq.gz

HG008-T_HiC_PhaseGenomics_20241211_R2.fastq.gz

https://42basepairs.com/browse/web/giab/data_somatic/HG008/NIST/HG008-T_bulk/20240508p21/PhaseGenomics_HiC-ILMN_20241211

Since Hi-C isn't ideal for small variant calling (like with Illumina, Thermo Fisher, or Nanopore WGS/WES), I was wondering:

Are these the correct validated VCFs for that sample? https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_somatic/HG008/Liss_lab/analysis/NIST_HG008-T_somatic-stvar_DraftBenchmark_V0.3-20250220/

Any advice on how to proceed?

fastq NGS pipeline INDELs SNVs • 2.4k views
ADD COMMENT
0
Entering edit mode

Finding recent public human tumor sequence data (and VCF) is going to be rare because of patient privacy concerns. You could sign up and access all types of cancer data via dbGaP (you would need to be a PI or someone with authority to sign to submit such a project proposal).

There are some public datasets mentioned in this past thread: Publicly Available Tumor/Normal Illumina Data For Evaluation Of Somatic Variant Callers

ADD REPLY
3
1
Entering edit mode

Can you also include a link to the page/source where this spreadsheet came from? A random google docs link seems a bit dodgy to click on and check.

ADD REPLY
0
Entering edit mode

The link is mentioned on the NIST website, just below Table 1.

https://www.nist.gov/programs-projects/cancer-genome-bottle

ADD REPLY
0
Entering edit mode

Thank you so much!! So kind!

ADD REPLY

Login before adding your answer.

Traffic: 3964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6