Question: The Question About Dp Value In Mutiple Samples Calculated By Ug
1
gravatar for Chris
6.7 years ago by
Chris40
Chris40 wrote:

Hi, Enthusiastic people I have my data successfully local realigned, BQSR, and then UG processing. But I find that, in VCF file, the DP value is very large, several hundred, which actually each of my data only is 10x-20x average. The data consist of 50 bams. Is the dp value calculated from 50*(10-20)?

And the UG walker tells me I need about 5 days to complete the process. 150GB size of 50 bams totally, is the time almost right?

Some warnings come out that : WARN 16:49:00,627 ExactAFCalculationModel - this tool is currently set to genotype at most 3 alternate alleles in a given context, but the context at chr1:38228257 has 16 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument. Is this matter?

The command I use for UG: java -jar -Djava.io.tmpdir=/data1/tmp /path/GenomeAnalysisTK-1.6-7-g2be5704/GenomeAnalysisTK.jar -R /path/ucsc.hg19.fasta -I /path/bam.list -T UnifiedGenotyper -D /data1/gatk/dbsnp_135.hg19.vcf -o SRR_50bam.raw.vcf -glm BOTH

a sample SNP in result: chrM 152 rs117135796 T C 3176.34 . AC=27;AF=0.307;AN=88;BaseQRankSum=-3.503;DB;DP=530;Dels=0.01;FS=3.082;HRun=1;HaplotypeScore=5.2820;InbreedingCoeff=0.6907;MQ=35.09;MQ0=13;MQRankSum=-8.934;QD=16.90;ReadPosRankSum=1.860;SB=-1671.05 GT:AD:DP:GQ:PL 1/1:0,14:14:42.04:395,42,0 ./. 0/0:5,0:6:12.02:0,12,121 0/0:9,0:9:18.02:0,18,176 0/0:14,0:14:42.06:0,42,428 1/1:0,7:7:21.03:202,21,0 0/0:8,0:9:21.04:0,21,209 0/1:1,3:4:22.54:44,0,23 0/0:6,0:6:18.04:0,18,182 0/0:14,1:15:23.98:0,24,209 0/0:14,1:15:23.98:0,24,209 0/0:13,0:13:36.06:0,36,358 1/1:0,20:20:48.06:454,48,0 0/0:8,0:8:12.03:0,12,132 0/0:21,0:21:57.11:0,57,582 1/1:0,10:10:24.02:223,24,0 1/1:0,20:20:57.06:542,57,0 0/0:26,2:30:56.99:0,57,523 0/0:15,1:17:33.06:0,33,337 0/0:19,0:19:45.11:0,45,479 0/0:8,0:8:24.06:0,24,253 0/0:7,0:7:21.01:0,21,194 0/0:21,0:21:51.07:0,51,490 1/1:0,13:13:32.96:284,33,0 1/1:0,9:9:17.99:153,18,0 0/0:19,0:19:51.07:0,51,505 0/1:4,13:17:75.62:181,0,76 0/0:20,0:20:48.05:0,48,457 0/0:12,0:12:18.01:0,18,166 1/1:0,13:15:26.99:240,27,0 0/0:12,0:12:30.01:0,30,280 1/1:0,14:14:39.05:384,39,0 1/1:0,20:20:9.01:85,9,0 0/0:3,0:3:9.02:0,9,94 0/0:16,0:16:45.06:0,45,441 0/0:26,0:26:69.06:0,69,663 0/1:6,7:13:99:134,0,118 1/1:0,3:3:3.01:31,3,0 0/0:3,0:4:3:0,3,29

I have search this forum for my question, but still confused, Sorry for my unprofessional question and appreciate for your help. Thanks

gatk • 1.6k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 6.7 years ago by Chris40
1
gravatar for Jorge Amigo
6.7 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

the fields on the INFO column of a vcf file are meant to be descriptive for the entire set analyzed, so if you are calling multiple samples the DP on the INFO column (defined as "Approximate read depth") would be the read depth of that site on all samples. if you look carefully on the samples' columns, the ones that use the GT:AD:DP:GQ:PL format, you will have there the DP value for each sample.

regarding time performance, it takes on our cluster ~2h to process ~100 bams of ~1G each, so your 50 bams of ~150G each would take ~6 days. considering that GATK constantly updates the time needed, I guess seeing a 5 days notice on yours when starting would be expected. have in mind though that UnifiedGenotyper can be used in parallel mode through the -nt option, which drastically reduces your timings. on the wiki there is a page about GATK parallelism, where it is stated that using 8 threads if possible would be the best scale. we simply use 2 and we get almost exactly half the original times.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Jorge Amigo11k

Thanks for your timely reply. I have understood more about the question. And your recommendation about -nt option helps me a lot!

ADD REPLYlink written 6.7 years ago by Chris40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1325 users visited in the last hour