Question: SAM validation error
0
gravatar for fire_water
3.2 years ago by
fire_water80
United States
fire_water80 wrote:

Command

picard-tools-1.118/MarkDuplicates.jar I=file.Aligned.sortedByCoord.bam O=file_out.bam METRICS_FILE=file_out.metrics

Error Message

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 801397832, Read name K00135:24:H3V7TBBXX:1:1101:13078:1332, Mate Alignment start (1651771652) must be <= reference sequence length (59128983) on reference chr19

file.Aligned.sortedByCoord.bam was created by STAR.

Has anyone else encountered a similar error? If so, what was the fix? Thanks!

software error • 3.0k views
ADD COMMENTlink modified 3.2 years ago by Pierre Lindenbaum127k • written 3.2 years ago by fire_water80

What happens if you samtools view file.Aligned.sortedByCoord.bam chr19 | grep K00135:24:H3V7TBBXX:1:1101:13078:1332?

What was the STAR command that created this (it's probably in the BAM header)?

What version of STAR is this?

ADD REPLYlink written 3.2 years ago by Devon Ryan94k

The SAMtools command returns:

"[main_samview] random alignment retrieval only works for indexed BAM or CRAM files."

STAR command in BAM header:

@PG ID:STAR PN:STAR VN:STAR_2.5.1b  CL:STAR_2.5.1   --runThreadN 16   --genomeDir /Human/Hg19/   --readFilesIn file_R1_val_1.fq.gz   file_R2_val_2.fq.gz      --readFilesCommand zcat      --outSAMtype BAM   SortedByCoordinate   
@CO user command line: STAR_2.5.1 --runThreadN 16 --genomeDir /Human/Hg19/ --readFilesIn file_R1_val_1.fq.gz   file_R2_val_2.fq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate

STAR version 2.5.1b

ADD REPLYlink written 3.2 years ago by fire_water80

Index the file and then try the samtools command again.

ADD REPLYlink written 3.2 years ago by Devon Ryan94k

Results:

K00135:24:H3V7TBBXX:1:1101:13078:1332   73  chr19   18876676    255 151M    *   0   0   TGAGAGGATCACTTGAGCCCAGGGGGTGGAGGCTGCAGTGAGCCATGATCACACCACTGCACTCTACCCTGGGAGACAGAGTGAGACTCTGTTTCAAAAAAAAAGAAAAAACTCAAGAGGTTAGCTTTTGATTTTTCAATTTGCTGTATTT <AA-FFAJF<FAFAFJJJJFJJJJFFJ-7F<-FJJJJJJJJFJJJAFJJFJFJ7-F<--7F-7--<<A77---FFJAFAF<<-<AFFA-<A7FAJJJFJJJJJJAFAAFAJF-AFAAJF7-77A----7-7-7<<FFAF-77<<7-7---< NH:i:1  HI:i:1  AS:i:149    nM:i:0
ADD REPLYlink written 3.2 years ago by fire_water80

If that still happens in the most recent version of picard then it's a bug in it.

ADD REPLYlink written 3.2 years ago by Devon Ryan94k
1
gravatar for Pierre Lindenbaum
3.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:
picard-tools-1.118/MarkDuplicates.jar

first : the version of picard you're using is just too old.

second: try to run MarkDuplicate with VALIDATION_STRINGENCY=LENIENT

ADD COMMENTlink written 3.2 years ago by Pierre Lindenbaum127k

Upgraded to picard-2.8.2 and re-ran using VALIDATION_STRINGENCY=LENIENT, as recommended above. The job failed after ~7 hours, created a 68 GB BAM, 13 MB BAI, and a 200 MB log file.

The log contains a ton of "Ignoring SAM validation error" messages. Below is the last few lines of the log:

Ignoring SAM validation error: ERROR: Record 805313152, Read name K00135:43:H3VC2BBXX:2:2228:1681:49107, Mate Alignment start (1916272743) must be <= reference sequence length (90354753) on reference chr16
INFO    2017-01-27 17:56:07     MarkDuplicates  Before output close freeMemory: 18524841008; totalMemory: 18696110080; maxMemory: 22487236608
INFO    2017-01-27 17:56:10     MarkDuplicates  After output close freeMemory: 18524838000; totalMemory: 18696110080; maxMemory: 22487236608
[Fri Jan 27 17:56:10 EST 2017] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 398.98 minutes.
Runtime.totalMemory()=18696110080

real    399m0.405s
user    384m57.886s
sys     9m45.964s

[Fri Jan 27 17:56:11 EST 2017] picard.sam.BuildBamIndex INPUT=dedup_CB1_valstr.bam    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri Jan 27 17:56:11 EST 2017] Executing as me@comp205t on Linux 2.6.32-642.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_101-b13; Picard version: 2.8.2-SNAPSHOT
[Fri Jan 27 18:29:42 EST 2017] picard.sam.BuildBamIndex done. Elapsed time: 33.51 minutes.
Runtime.totalMemory()=2044723200
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 801397832, Read name K00135:24:H3V7TBBXX:1:1101:13078:1332, Mate Alignment start (1651771652) must be <= reference sequence length (59128983) on reference chr19
        at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:665)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543)
        at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:305)
        at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:289)
        at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:147)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
ADD REPLYlink written 3.2 years ago by fire_water80

I'm surprised you got an error at this point:

at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)

because it's clearly raised when the validation stringency is STRICT: https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/SAMUtils.java#L447

Can you please show me the complete command line ?

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum127k

Furthermore, the program used here is not MarkDuplicate but BuildBamIndex (???)

    at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:147)
ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum127k

The commands are ran in this order:

java -jar picard-2.8.2.jar MarkDuplicates I=file.Aligned.sortedByCoord.bam O=file.out.bam M=file.metrics VALIDATION_STRINGENCY=LENIENT

java -jar picard-2.8.2.jar BuildBamIndex I=file.out.bam

Thank you for your continued help :)

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by fire_water80

you said:

 and re-ran using VALIDATION_STRINGENCY=LENIENT,

...

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum127k

Command updated above (accidentally copied the old command from an old SLURM script). Thanks.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by fire_water80

but you didn't use VALIDATION_STRINGENCY=LENIENT with BuildBamIndex ...

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum127k

Re-ran MarkDuplicates then BuildBamIndex using VALIDATION_STRINGENCY=LENIENT and the job completed without errors this time! However, a 400 MB log file was created with 2 million lines similar to the following:

Ignoring SAM validation error: ERROR: Record 801397832, Read name K00135:24:H3V7TBBXX:1:1101:13078:1332, Mate Alignment start (1651771652) must be <= reference sequence length (59128983) on reference chr19

Should I be concerned? If not, can these messages be turned off so that such a large log isn't created ? Thanks.

ADD REPLYlink written 3.2 years ago by fire_water80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour