23 months ago
esimonova.me ▴ 20

Our lab sequenced NA12763 sample in order to validate a bioinformatics pipeline for the analysis of data. I am not sure where can I find a validation vcf file to compare a generated vcf with a sequenced one (via hap.py tool)?

I just found that the sample was sequenced in 1000G project but vcf doesn't seem to be the one I need, I am still junior in bioinformatics but it is the first time I see ALT column described in the following way:

1   1   .   N   **<CGA_NOCALL**>    .   .   END=10000;NS=1;AN=0 GT:PS   ./.:.
1   10001   .   T   **<CGA_CNVWIN>**    .   .   NS=1;CGA_WINEND=12000
1   10001   .   T   **<CGA_NOCALL>**


It is written that snv and indel file even though I think it is CGA array CNV data.

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/NA12763/cg_data/

