SAM validation error
1
0
Entering edit mode
7.2 years ago
fire_water ▴ 80

Command

picard-tools-1.118/MarkDuplicates.jar I=file.Aligned.sortedByCoord.bam O=file_out.bam METRICS_FILE=file_out.metrics

Error Message

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 801397832, Read name K00135:24:H3V7TBBXX:1:1101:13078:1332, Mate Alignment start (1651771652) must be <= reference sequence length (59128983) on reference chr19

file.Aligned.sortedByCoord.bam was created by STAR.

Has anyone else encountered a similar error? If so, what was the fix? Thanks!

software error • 6.3k views
ADD COMMENT
0
Entering edit mode

What happens if you samtools view file.Aligned.sortedByCoord.bam chr19 | grep K00135:24:H3V7TBBXX:1:1101:13078:1332?

What was the STAR command that created this (it's probably in the BAM header)?

What version of STAR is this?

ADD REPLY
0
Entering edit mode

The SAMtools command returns:

"[main_samview] random alignment retrieval only works for indexed BAM or CRAM files."

STAR command in BAM header:

@PG ID:STAR PN:STAR VN:STAR_2.5.1b  CL:STAR_2.5.1   --runThreadN 16   --genomeDir /Human/Hg19/   --readFilesIn file_R1_val_1.fq.gz   file_R2_val_2.fq.gz      --readFilesCommand zcat      --outSAMtype BAM   SortedByCoordinate   
@CO user command line: STAR_2.5.1 --runThreadN 16 --genomeDir /Human/Hg19/ --readFilesIn file_R1_val_1.fq.gz   file_R2_val_2.fq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate

STAR version 2.5.1b

ADD REPLY
0
Entering edit mode

Index the file and then try the samtools command again.

ADD REPLY
0
Entering edit mode

Results:

K00135:24:H3V7TBBXX:1:1101:13078:1332   73  chr19   18876676    255 151M    *   0   0   TGAGAGGATCACTTGAGCCCAGGGGGTGGAGGCTGCAGTGAGCCATGATCACACCACTGCACTCTACCCTGGGAGACAGAGTGAGACTCTGTTTCAAAAAAAAAGAAAAAACTCAAGAGGTTAGCTTTTGATTTTTCAATTTGCTGTATTT <AA-FFAJF<FAFAFJJJJFJJJJFFJ-7F<-FJJJJJJJJFJJJAFJJFJFJ7-F<--7F-7--<<A77---FFJAFAF<<-<AFFA-<A7FAJJJFJJJJJJAFAAFAJF-AFAAJF7-77A----7-7-7<<FFAF-77<<7-7---< NH:i:1  HI:i:1  AS:i:149    nM:i:0
ADD REPLY
0
Entering edit mode

If that still happens in the most recent version of picard then it's a bug in it.

ADD REPLY
1
Entering edit mode
7.2 years ago
picard-tools-1.118/MarkDuplicates.jar

first : the version of picard you're using is just too old.

second: try to run MarkDuplicate with VALIDATION_STRINGENCY=LENIENT

ADD COMMENT
0
Entering edit mode

Upgraded to picard-2.8.2 and re-ran using VALIDATION_STRINGENCY=LENIENT, as recommended above. The job failed after ~7 hours, created a 68 GB BAM, 13 MB BAI, and a 200 MB log file.

The log contains a ton of "Ignoring SAM validation error" messages. Below is the last few lines of the log:

Ignoring SAM validation error: ERROR: Record 805313152, Read name K00135:43:H3VC2BBXX:2:2228:1681:49107, Mate Alignment start (1916272743) must be <= reference sequence length (90354753) on reference chr16
INFO    2017-01-27 17:56:07     MarkDuplicates  Before output close freeMemory: 18524841008; totalMemory: 18696110080; maxMemory: 22487236608
INFO    2017-01-27 17:56:10     MarkDuplicates  After output close freeMemory: 18524838000; totalMemory: 18696110080; maxMemory: 22487236608
[Fri Jan 27 17:56:10 EST 2017] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 398.98 minutes.
Runtime.totalMemory()=18696110080

real    399m0.405s
user    384m57.886s
sys     9m45.964s

[Fri Jan 27 17:56:11 EST 2017] picard.sam.BuildBamIndex INPUT=dedup_CB1_valstr.bam    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri Jan 27 17:56:11 EST 2017] Executing as me@comp205t on Linux 2.6.32-642.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_101-b13; Picard version: 2.8.2-SNAPSHOT
[Fri Jan 27 18:29:42 EST 2017] picard.sam.BuildBamIndex done. Elapsed time: 33.51 minutes.
Runtime.totalMemory()=2044723200
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 801397832, Read name K00135:24:H3V7TBBXX:1:1101:13078:1332, Mate Alignment start (1651771652) must be <= reference sequence length (59128983) on reference chr19
        at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:665)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543)
        at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:305)
        at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:289)
        at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:147)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
ADD REPLY
0
Entering edit mode

I'm surprised you got an error at this point:

at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)

because it's clearly raised when the validation stringency is STRICT: https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/SAMUtils.java#L447

Can you please show me the complete command line ?

ADD REPLY
0
Entering edit mode

Furthermore, the program used here is not MarkDuplicate but BuildBamIndex (???)

    at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:147)
ADD REPLY
0
Entering edit mode

The commands are ran in this order:

java -jar picard-2.8.2.jar MarkDuplicates I=file.Aligned.sortedByCoord.bam O=file.out.bam M=file.metrics VALIDATION_STRINGENCY=LENIENT

java -jar picard-2.8.2.jar BuildBamIndex I=file.out.bam

Thank you for your continued help :)

ADD REPLY
0
Entering edit mode

you said:

 and re-ran using VALIDATION_STRINGENCY=LENIENT,

...

ADD REPLY
0
Entering edit mode

Command updated above (accidentally copied the old command from an old SLURM script). Thanks.

ADD REPLY
0
Entering edit mode

but you didn't use VALIDATION_STRINGENCY=LENIENT with BuildBamIndex ...

ADD REPLY
0
Entering edit mode

Re-ran MarkDuplicates then BuildBamIndex using VALIDATION_STRINGENCY=LENIENT and the job completed without errors this time! However, a 400 MB log file was created with 2 million lines similar to the following:

Ignoring SAM validation error: ERROR: Record 801397832, Read name K00135:24:H3V7TBBXX:1:1101:13078:1332, Mate Alignment start (1651771652) must be <= reference sequence length (59128983) on reference chr19

Should I be concerned? If not, can these messages be turned off so that such a large log isn't created ? Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6