Picard Sortsam Complains That Bam File Generated By Picard Addorreplacereadgroups Is Truncated
2
4
Entering edit mode
12.2 years ago
Anna S ▴ 510

Hello,

I am trying to run the gatk UnifiedGenotyper, but it complained that the bam file did not have group information. After looking at the documentation I ran the picard AddOrReplaceReadGroups as follows:

java  -jar AddOrReplaceReadGroups.jar I=sample.bam O=sample_addGroup.bam SORT_ORDER=coordinate CREATE_INDEX=true RGPL=illumina RGID=184 RGSM=sample184 RGLB=bar RGPU=pu184 VALIDATION_STRINGENCY=LENIENT

This created bam file sample_addGroup.bam. However, when I tried to run gatk it complained that the contigs were out of order (reads contigs = [chr1, chr10, chr11, etc...] instead or coordinate order) even though I used the parameter SORT_ORDER=coordinate above.

So I ran the picard SortSam tool as follows:

java -jar SortSam.jar I=sample_addGroup.bam O=sample_addGroupSorted.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT

However, I'm getting the following error. Any ideas??? Thank you very much!!!!!!!!!

.
.
.
INFO    2012-02-13 09:42:51     SortSam Read 630000000 records.
INFO    2012-02-13 09:43:43     SortSam Read 640000000 records.
INFO    2012-02-13 09:44:36     SortSam Read 650000000 records.
INFO    2012-02-13 09:45:30     SortSam Read 660000000 records.
INFO    2012-02-13 09:46:20     SortSam Read 670000000 records.
[Mon Feb 13 09:46:22 EST 2012] net.sf.picard.sam.SortSam done. Elapsed time: 59.
17 minutes.
Runtime.totalMemory()=1423900672
Exception in thread "main" net.sf.samtools.FileTruncatedException: Premature end
 of file
        at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompre
ssedInputStream.java:359)
        at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompre
ssedInputStream.java:109)
        at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedI
nputStream.java:234)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:39
4)
        at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371)
        at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357)
        at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:188)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileRe
ader.java:514)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.j
ava:488)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java
:478)
        at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java
:444)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.j
ava:641)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.j
ava:619)
        at net.sf.picard.sam.SortSam.doWork(SortSam.java:67)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProg
ram.java:177)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(Command
LineProgram.java:119)
        at net.sf.picard.sam.SortSam.main(SortSam.java:79)
picard gatk bam • 15k views
ADD COMMENT
3
Entering edit mode
12.2 years ago

This error probably means that the contigs in the file are not ordered as GATK expects them: chr1, chr2, chr3 etc. This ordering must be present in addition to coordinate ordering.

http://www.broadinstitute.org/gsa/wiki/index.php/Input_files_for_the_GATK

As mentioned on the page above you may be able to use ReorderSam to fix the error:

http://www.broadinstitute.org/gsa/wiki/index.php/ReorderSam

ADD COMMENT
0
Entering edit mode

Thank you so much Istvan! You're my guardian angel :-) In fact ReorderSam sorted the file in the order that I was expecting while SortSam did not. The truncated problem was something else, see my response to Matt below, but your response helped me tremendously once I got that first problem resolved. Thanks again!!!

ADD REPLY
2
Entering edit mode
12.2 years ago

All of those references to truncated files and files ending prematurely makes me think that your file has been corrupted by adding new read groups. You might want to try sorting your file before adding your read groups, then add read groups using the proper sort order. Also, make sure that your Picard index you are using is build37, which you can download in the GATK resource bundle: http://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle.

It is important that you are using a reference genome with the correct convention for chromosome numbering. For instance, hg19 from UCSC numbers the chromosomes "1-22,X,Y,MT". B37 that GATK expects is numbered "chr1-chr22,chrX,chrY,MT".

ADD COMMENT
0
Entering edit mode

Thank you very much Matt for your thoughtful answer!!! I'm thinking that the files were corrupted due to running into the disk quota. The output files were very large and no error messages were logged, but as you noted they were indeed corrupted. Once I was given more disk space and reran the add group command then it worked. Thank you again!

ADD REPLY

Login before adding your answer.

Traffic: 3031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6