Question: GATK Unified Genotyper
0
gravatar for pinninti1991reddy
23 months ago by
pinninti1991reddy30 wrote:

Hi, I try to call variants from the bam I did this way at the end I generated a mem_UG.vcf (481mb). I'm not able to understand at the end of the execution process. Whether it generated correct output ?

**CMD**
 java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP

likithreddy@likith:~/Downloads/GATK$ java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP 
INFO  17:13:17,759 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  17:13:17,793 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 
INFO  17:13:17,793 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  17:13:17,794 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  17:13:17,794 HelpFormatter - [Fri Jan 05 17:13:17 IST 2018] Executing on Linux 4.8.0-36-generic amd64 
INFO  17:13:17,794 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 
INFO  17:13:17,798 HelpFormatter - Program Args: -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP 
INFO  17:13:17,838 HelpFormatter - Executing as likithreddy@likith on Linux 4.8.0-36-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12. 
INFO  17:13:17,839 HelpFormatter - Date/Time: 2018/01/05 17:13:17 
INFO  17:13:17,839 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  17:13:17,839 HelpFormatter - ---------------------------------------------------------------------------------- 
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/likithreddy/Downloads/GATK/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO  17:13:18,129 GenomeAnalysisEngine - Deflater: JdkDeflater 
INFO  17:13:18,129 GenomeAnalysisEngine - Inflater: JdkInflater 
INFO  17:13:18,130 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:13:18,426 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  17:13:18,434 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  17:13:18,519 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 
INFO  17:13:18,843 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  17:13:19,168 GenomeAnalysisEngine - Done preparing for traversal 
INFO  17:13:19,168 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  17:13:19,169 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  17:13:19,169 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  17:13:19,247 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
WARN  17:13:19,247 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
INFO  17:13:19,248 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. 
INFO  17:13:49,172 ProgressMeter -    chr1:5268065   5259264.0    30.0 s       5.0 s        0.2%     5.1 h       5.1 h 
INFO  17:14:19,173 ProgressMeter -   chr1:10664901   1.06496E7    60.0 s       5.0 s        0.3%     5.0 h       5.0 h 
INFO  17:14:49,174 ProgressMeter -   chr1:15275489   1.5269888E7    90.0 s       5.0 s        0.5%     5.3 h 
      5.2 h 


**At End** (END Message)

INFO  21:14:29,686 ProgressMeter - chrY_KI270740v1_random:37201   3.209248865E9     4.0 h       4.0 s      100.0%     4.0 h       0.0 s 
INFO  21:14:29,686 ProgressMeter -            done   3.209286105E9     4.0 h       4.0 s      100.0%     4.0 h       0.0 s 
INFO  21:14:29,687 ProgressMeter - Total runtime 14470.52 secs, 241.18 min, 4.02 hours 
INFO  21:14:29,687 MicroScheduler - 1053193 reads were filtered out during the traversal out of approximately 185715730 total reads (0.57%) 
INFO  21:14:29,687 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter 
INFO  21:14:29,688 MicroScheduler -   -> 836765 reads (0.45% of total) failing BadMateFilter 
INFO  21:14:29,688 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
INFO  21:14:29,688 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  21:14:29,688 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
INFO  21:14:29,688 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  21:14:29,688 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
INFO  21:14:29,688 MicroScheduler -   -> 216428 reads (0.12% of total) failing UnmappedReadFilter 
------------------------------------------------------------------------------------------
Done. There were 1 WARN messages, the first 1 are repeated below.
WARN  17:13:19,247 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
alignment • 2.1k views
ADD COMMENTlink modified 23 months ago by Raony Guimarães1.1k • written 23 months ago by pinninti1991reddy30
1

May I ask why you did not use GATK haplotypecaller?


I see only one error in logging ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory

and the rest looks normal to me

ADD REPLYlink modified 23 months ago • written 23 months ago by Medhat8.6k

Hi, can you check out this HaplotypeCaller it generated a 308 Mb VCF file. Is this the correct way to do it. I ran this on my workstation with 16GB RAM. The elapsed time 8:45 hr. Can you give brief explanation HC vs UG tools for better understanding ?

CMD likith@likith-VPCEG2AEN:~/Downloads/GenomeAnalysisTK-3.8-0-ge9d806836$ java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg38.fa -I /media/likith/REDDY/MEM/ReadgroupsSRR098401mem_pesort.bam -o mem.vcf INFO 14:43:26,496 HelpFormatter - ---------------------------------------------------------------------------------- INFO 14:43:26,499 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 INFO 14:43:26,499 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 14:43:26,499 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 14:43:26,499 HelpFormatter - [Fri Jan 05 14:43:26 IST 2018] Executing on Linux 4.10.0-42-generic amd64 INFO 14:43:26,499 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 INFO 14:43:26,503 HelpFormatter - Program Args: -T HaplotypeCaller -R hg38.fa -I /media/likith/REDDY/MEM/ReadgroupsSRR098401mem_pesort.bam -o mem.vcf INFO 14:43:26,506 HelpFormatter - Executing as likith@likith-VPCEG2AEN on Linux 4.10.0-42-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12. INFO 14:43:26,506 HelpFormatter - Date/Time: 2018/01/05 14:43:26 INFO 14:43:26,506 HelpFormatter - ---------------------------------------------------------------------------------- INFO 14:43:26,506 HelpFormatter - ---------------------------------------------------------------------------------- ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/likith/Downloads/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console... INFO 14:43:26,637 GenomeAnalysisEngine - Deflater: IntelDeflater INFO 14:43:26,637 GenomeAnalysisEngine - Inflater: IntelInflater INFO 14:43:26,638 GenomeAnalysisEngine - Strictness is SILENT INFO 14:43:26,942 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 14:43:26,948 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:43:27,031 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 INFO 14:43:27,296 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 14:43:27,414 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 14:43:27,720 GenomeAnalysisEngine - Done preparing for traversal INFO 14:43:27,721 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 14:43:27,721 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 14:43:27,721 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 14:43:27,722 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output INFO 14:43:27,848 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. WARN 14:43:27,848 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. INFO 14:43:27,849 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. INFO 14:43:27,983 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units INFO 14:43:34,599 VectorLoglessPairHMM - Using OpenMP multi-threaded AVX-accelerated native PairHMM implementation [INFO] Available threads: 4 [INFO] Requested threads: 1 [INFO] Using 1 threads WARN 14:43:34,677 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller INFO 14:43:57,724 ProgressMeter - chr1:2388093 0.0 30.0 s 49.6 w 0.1% 11.2 h 11.2 h INFO 14:44:27,725 ProgressMeter - chr1:5671939 0.0 60.0 s 99.2 w 0.2% 9.4 h 9.4 h INFO 14:44:57,726 ProgressMeter - chr1:9313903 0.0 90.0 s 148.8 w 0.3% 8.6 h 8.6 h INFO 14:45:27,727 ProgressMeter - chr1:12327961 0.0 120.0 s 198.4 w 0.4% 8.7 h 8.6 h INFO 14:46:07,728 ProgressMeter - chr1:15564203 0.0 2.7 m 264.6 w 0.5% 9.2 h 9.1 h

ADD REPLYlink written 23 months ago by pinninti1991reddy30

*Last Few Lines*

INFO 22:26:49,098 ProgressMeter - chrY:57093381 3.15202145E9 7.7 h 8.0 s 100.0% 7.7 h 1.0 s INFO 22:26:49,178 VectorLoglessPairHMM - Time spent in setup for JNI call : 8.726722330000001 INFO 22:26:49,180 PairHMM - Total compute time in PairHMM computeLikelihoods() : 2404.092605758 INFO 22:26:49,181 HaplotypeCaller - Ran local assembly on 1433982 active regions INFO 22:26:49,721 ProgressMeter - done 3.209286105E9 7.7 h 8.0 s 100.0% 7.7 h 0.0 s INFO 22:26:49,722 ProgressMeter - Total runtime 27802.00 secs, 463.37 min, 7.72 hours INFO 22:26:49,723 MicroScheduler - 17692810 reads were filtered out during the traversal out of approximately 184916695 total reads (9.57%) INFO 22:26:49,723 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 22:26:49,724 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter INFO 22:26:49,724 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 22:26:49,725 MicroScheduler - -> 17692810 reads (9.57% of total) failing HCMappingQualityFilter INFO 22:26:49,725 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 22:26:49,726 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 22:26:49,726 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter

INFO 22:26:49,727 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

Done. There were 4 WARN messages, the first 4 are repeated below. WARN 14:43:27,848 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. WARN 14:43:34,677 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller WARN 16:55:01,654 HaplotypeCallerGenotypingEngine - location chr14_GL000225v1_random:67535: too many alternative alleles found (9) larger than the maximum requested with -maxAltAlleles (6), the following will be dropped: GGTGATGCAACTCTTGCCTAGGCTTTGCCTACAGGGTACATCGTGACATATCGCTTCAATGATCACCCAT, GGTGATGCAACTCTTGCCTAGGCTTTGCCTACAGGGTACATTGTGACATATCGCTTCAATGATCACCCAT, GGTGATGCAACTCTTGCCTAGGCTTTGCCTACAGGGGACATCGTGACATATCGCTTCAATGATCACCCAT.

WARN 19:10:59,577 HaplotypeCallerGenotypingEngine - location chr22:43972848-43972851: too many alternative alleles found (7) larger than the maximum requested with -maxAltAlleles (6), the following will be dropped: CTTT.

ADD REPLYlink written 23 months ago by pinninti1991reddy30

First:
Regarding the error please follow this thread:
https://gatkforums.broadinstitute.org/gatk/discussion/10004/realignertargetcreator-hangs
which suggest some error in the build.
why Haplotypecaller?!

The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper. Its ability to call SNPs is equivalent to that of the UnifiedGenotyper, its ability to call indels is far superior, and it is now capable of calling non-diploid samples. It also comprises several unique functionalities such as the reference confidence model (which enables efficient and incremental variant discovery on ridiculously large cohorts) and special settings for RNAseq data.

more

ADD REPLYlink modified 23 months ago • written 23 months ago by Medhat8.6k

I'm not going to add formatting to your post again. I told you in another comment how to do that. You should put some more effort in this yourself.

ADD REPLYlink written 23 months ago by WouterDeCoster42k

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 23 months ago by WouterDeCoster42k
0
gravatar for Raony Guimarães
23 months ago by
Dublin / Ireland
Raony Guimarães1.1k wrote:

This is likely happening because you are not giving enough memory to java with the parameter: -Xmx

Try running it again with the following command:

java -Xmx 32g -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP

ADD COMMENTlink modified 23 months ago • written 23 months ago by Raony Guimarães1.1k

But it ran successfully?

ADD REPLYlink written 23 months ago by WouterDeCoster42k

Yes, this is only a warning.

ADD REPLYlink written 23 months ago by Raony Guimarães1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1014 users visited in the last hour