Using 1000 genomes phase 3 variants for BQSR
0
0
Entering edit mode
3.2 years ago
prasundutta87 ▴ 660

Hi,

I have a general question regarding truth sets that need to be used for BQSR step in the GATK workflow. I am aware that a lot of variant datasets (SNPs and Indels) from phase 1 of 1000 genomes project are being currently used for this, but the consortium has come up with phase 3 variants as well. Their biallelic SNVs and Indels are present here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz

Will it be okay to use this instead of phase 1 datasets that can be seen here (SNPs and Indels)? - https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?pli=1&prefix=&forceOnObjectsSortingFiltering=false

Would like to know what the community thinks about this.

There is also this dataset-ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz, but his has multiallelic variants, structural variation, etc, and hence, I won't be using it.

Regards, Prasun

SNP next-gen snp GATK • 875 views
ADD COMMENT

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6