Question: Why Gatk Realignertargetcreator Outputs An Empty .Intervals???
2
gravatar for Chris
7.6 years ago by
Chris40
Chris40 wrote:

Hello,everyone! When I use GATK to do the first step of Local Alignment,RealignerTargetCreator,to creat the .intervals file with a raw .BAM file (the .bai file and REF.fasta REF.fai REF.dict are complete), I got an empty output .intervals file after hours with NO ERROR in process. The command follows below:

java -jar /path/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/chris/data/hg/hg19.fasta -I /home/chris/data/reorder.test.sorted.bam -o reorder.test.sorted.intervals

Then, the GATK runs well with NO ERROR until the process ends with an empty output (0 byte).

And I use the simple sample files in resources/ in GATK's folder:

java -jar /path/GenomeAnalysisTK.jar -T RealignerTargetCreator -R resources/exampleFASTA.fasta -I resources/exampleBAM.bam  -o example.intervals, the output is still empty.

Does someone has this problem met? I am new to GATK, I would be grateful if someone tell me why and how I can solve this!

chris@chris-OptiPlex-780:~/install/GenomeAnalysisTK-1.5-9-ga05a7f2$ java -jar      
GenomeAnalysisTK.jar -I /home/chris/data/rat_rel65_MT_validated.bam -R     
/home/chris/data/rat_rel65_MT_validated.fasta -T RealignerTargetCreator -o  
123456.intervals

INFO  17:17:14,667 HelpFormatter - 

----------------------------------------------------- --------------------------- 
INFO  17:17:14,686 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.5-9-ga05a7f2,   Compiled 2012/03/17 00:05:08 
INFO  17:17:14,686 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  17:17:14,686 HelpFormatter - Please view our documentation at    http://www.broadinstitute.org/gsa/wiki 
INFO  17:17:14,686 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  17:17:14,687 HelpFormatter - Program Args: -I /home/chris/data/rat_rel65_MT_validated.bam -R /home/chris/data/rat_rel65_MT_validated.fasta -T RealignerTargetCreator -o 123456.intervals 
INFO  17:17:14,687 HelpFormatter - Date/Time: 2012/04/07 17:17:14 
INFO  17:17:14,687 HelpFormatter - ----------------------------------------------------- --------------------------- 
INFO  17:17:14,688 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  17:17:14,703 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:17:14,882 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  17:17:14,976 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 
INFO  17:17:16,658 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] 
INFO  17:17:16,659 TraversalEngine -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  17:17:46,578 TraversalEngine -  chr10:17409001        1.74e+07   30.0 s        1.7 s      0.6%        78.1 m    77.6 m 
INFO  17:18:16,762 TraversalEngine -  chr10:37666001        3.77e+07   60.2 s        1.6 s      1.4%        72.4 m    71.4 m 
INFO  17:18:46,763 TraversalEngine -  chr10:54320001        5.43e+07   90.2 s        1.7 s      2.0%        75.2 m    73.7 m 
INFO  17:19:17,276 TraversalEngine -  chr10:69417001        6.94e+07    2.0 m        1.7 s      2.6%        78.8 m    76.8 m 
INFO  17:19:47,533 TraversalEngine -  chr10:70190001        7.02e+07    2.5 m        2.2 s      2.6%        97.5 m    94.9 m 
INFO  17:20:17,534 TraversalEngine -  chr10:90199001        9.02e+07    3.0 m        2.0 s      3.3%        90.9 m    87.9 m 
..............................................
..............................................
INFO  18:03:31,900 TraversalEngine -   chr9:93072115        2.54e+09   71.9 m        1.7 s     93.3%        77.0 m     5.1 m 
INFO  18:04:01,912 TraversalEngine -  chr9:109658115        2.55e+09   72.4 m        1.7 s     93.9%        77.1 m     4.7 m 
INFO  18:04:09,814 TraversalEngine - Total runtime 4352.99 secs, 72.55 min, 1.21 hours 
INFO  18:04:09,814 TraversalEngine - 180568 reads were filtered out during traversal out of 26944143 total (0.67%) 
INFO  18:04:09,815 TraversalEngine -   -> 180568 reads (0.67% of total) failing MappingQualityZeroFilter 
INFO  18:04:16,423 GATKRunReport - Uploaded run statistics report to AWS S3

Another DataSet Still got an empty .intervals.

I am so sad!

Thank you !

gatk • 4.3k views
ADD COMMENTlink modified 7.6 years ago by Johan860 • written 7.6 years ago by Chris40
1
gravatar for Johan
7.6 years ago by
Johan860
Sweden
Johan860 wrote:

The RealignmentTargetCreator needs a set of known indels to realign against. This is set with the "--known" option. This can either be from a external sources such as the dbSNP, or from your own raw indel calling.

Here is a link explaining it in more detail: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels.

Hope this helps. :)

ADD COMMENTlink written 7.6 years ago by Johan860

thanks a lot! Actually,I have NO dbSNP file or the argument --knownSites needs, and --known is Optional. Even without it, I should get the right output, am I right??

ADD REPLYlink written 7.6 years ago by Chris40

hanks a lot! Actually,I have NO dbSNP file or the argument --known needs, and --known is Optional. Even without it, I should get the right output, am I right?? –

ADD REPLYlink written 7.6 years ago by Chris40

My guess is that you will only get intervals for realignment if the walker detects a region which is in need of realignment. You might try playing around with the rest of the parameters as described here: http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_indels_RealignerTargetCreator.html Not having seen your data it's difficult to say if there is a need for realignment or not.

ADD REPLYlink written 7.6 years ago by Johan860

btw I saw now that you had some really long times to run the analysis. If you have the possibility of increasing the memory to the Java VM by adding a -Xmx flag, e.g. Java -Xmx4g [-jar GenomeAnalysisTK.jar etc...] might make it run faster.

ADD REPLYlink written 7.6 years ago by Johan860

btw I saw now that you had some really long times to run the analysis. If you have the possibility of increasing the memory to the Java VM by adding a -Xmx flag, e.g. Java -Xmx4g [-jar GenomeAnalysisTK.jar etc...] it might make it run faster

ADD REPLYlink written 7.6 years ago by Johan860

Thanks! You mean my .BAM file may need not to be realigned? But how about the sample data? The result is still empty, what is your result about the sample?

ADD REPLYlink written 7.6 years ago by Chris40

I get the same result as you for the sample file with the same settings.

ADD REPLYlink written 7.6 years ago by Johan860

From http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3: "Fully local realignment uses mismatching bases to determine if a site should be realigned, and relies on sufficient coverage to discover the correct indel allele in the reads for alignment. It is much slower (involves SW step) but can discover new indel sites in the reads. If you have a database of known indels (for human, this database is extensive) then at this stage you would also include these indels during realignment, which vastly improves sensitivity, specificity, and speed."

ADD REPLYlink written 7.6 years ago by Johan860

My interpretation of this is that you either include previously known indels, or that you will have to change the "--mismatchFraction" parameter for get it to realign regions where indels might have messed up your raw alignments.

ADD REPLYlink written 7.6 years ago by Johan860

Tnaks for the timely reply and sorry for my delay. I have download the data, hg19.20.bam and its fasta file, in GATK resource bundle b37 to check the approach. Finally, I get a very good result as same as the Given one, and the INDEL can be called. Also, the official reply to me said that it maybe caused by the my .BAM file. I think so too.

ADD REPLYlink written 7.6 years ago by Chris40

Hi Chris, I have the same problems. How did you fix it? With additional indel files? Could post your code for it? And what do you mean it may be caused by .BAM file? How to check if there are problems in bam file?

ADD REPLYlink written 7.3 years ago by C Shao130

Hi C Shao, I got the same problem. That's an empty result file about RealignmentTargetCreator. How did you fix it ? 

ADD REPLYlink written 4.9 years ago by liudl20130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1783 users visited in the last hour