Question: Picard - MergeBamAlignment: Reads remaining on alignment iterator
1
gravatar for godth13teen
3 months ago by
godth13teen10
godth13teen10 wrote:

Hi, I'm new to genome analysis and I need some advice on this. I'm currently follow GATK best practice, now I'm stuck at the MergeBamAlignment step of picard. I'm using the data from https://www.internationalgenome.org/, the reference genome is also downloaded from International genome database.

My command:

MergeBamAlignment -ALIGNED aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -UNMAPPED /aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam -O test.bam -R refgen/hs37d5.fa.gz -TMP_DIR tmp

I got many warning, and didn't get the output file:

WARNING 2020-02-23 21:54:56     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO    2020-02-23 21:54:56     SamAlignmentMerger      Finished reading 28356 total records from alignment SAM/BAM.
WARNING 2020-02-23 21:54:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
WARNING 2020-02-23 21:54:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 2 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz and file:/data4t/KHV1000G/refgen/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
.... (many warnings in here, similar structure)
[Sun Feb 23 21:54:56 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

What have go wrong in my case? I downloaded all of my files directly from international genomes database ftp, without any modification. Please advice me on this, thank you

alignment genome • 248 views
ADD COMMENTlink modified 3 months ago by Pierre Lindenbaum128k • written 3 months ago by godth13teen10

Please do not delete previous questions that received comments or answers. This is considered disrespectful towards the users who invested time to help you.

ADD REPLYlink written 3 months ago by ATpoint35k

I think you didn't tell us that you post-processed one of the bam using another REF sequence....

file:/data4t/KHV1000G/refgen/hs37d5.fa.gz
ADD REPLYlink written 3 months ago by Pierre Lindenbaum128k

I download both bam files from international genome, the REF sequence you state is just the path on my disk. I didn't process any of those files

ADD REPLYlink written 3 months ago by godth13teen10

Ah sorry, but where did you get this file refgen/hs37d5.fa.gz ? it's not the same as the one in the BAM. The associated dict file should look like this.

@SQ SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
@SQ SN:2    LN:243199373    M5:a0d9851da00400dec1098a9255ac712e UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
(...)
@SQ SN:NC_007605    LN:171823   M5:6743bd63b3ff2b5b8985d8933c53290a UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
@SQ SN:hs37d5   LN:35477943 M5:5b6a4b3a81a2d3c134b7d14bf6ad39f1 UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz AS:NCBI37   SP:Human
ADD REPLYlink written 3 months ago by Pierre Lindenbaum128k

I used samtools to check @SQ tag of the bam files and got this link: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz

I downloaded directly from it

ADD REPLYlink written 3 months ago by godth13teen10

what is the version of picard ? do you use the latest version ?

ADD REPLYlink written 3 months ago by Pierre Lindenbaum128k

did you create the associated dict file using https://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

ADD REPLYlink written 3 months ago by Pierre Lindenbaum128k

I'm using picard 2.21.8, the latest version I think. I also created the dict using

java -jar picard.jar CreateSequenceDictionary R=refgen/hs37d5.fa.gz O=refgen/hs37d5.dict
ADD REPLYlink modified 3 months ago • written 3 months ago by godth13teen10

So sorry for that, I didn't state clear enough in that one

ADD REPLYlink written 3 months ago by godth13teen10
1
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

Re-generate the dict file with CreateSequenceDictionary but use the following values so they match the ones in the BAM

java -jar picard.jar CreateSequenceDictionary R=refgen/hs37d5.fa.gz O=refgen/hs37d5.dict U=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz SPECIES=Human  GENOME_ASSEMBLY=NCBI37
ADD COMMENTlink written 3 months ago by Pierre Lindenbaum128k

Hi, thank you for your answer, I followed that step, this time there is less warning, but it still didn't output anyfile. The full log is here:

09:02:54.970 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/thanhnguyen/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Feb 24 09:02:54 ICT 2020] MergeBamAlignment UNMAPPED_BAM=aln/HG02142.unmapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam ALIGNED_BAM=[aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam] OUTPUT=HG02142/HG02142_fin.bam TMP_DIR=[tmp] REFERENCE_SEQUENCE=refgen/hs37d5.fa.gz    ADD_PG_TAG_TO_READS=true PAIRED_RUN=true CLIP_ADAPTERS=true IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false MAX_INSERTIONS_OR_DELETIONS=1 ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 ALIGNER_PROPER_PAIR_FLAGS=false SORT_ORDER=coordinate PRIMARY_ALIGNMENT_STRATEGY=BestMapq CLIP_OVERLAPPING_READS=true INCLUDE_SECONDARY_ALIGNMENTS=true ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=false MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONARY_TAGS=[M5, LN] UNMAPPED_READ_STRATEGY=DO_NOT_CHANGE VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Feb 24 09:02:54 ICT 2020] Executing as myuser@abc.com on Linux 3.13.0-147-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.8-SNAPSHOT
INFO    2020-02-24 09:02:55     SamAlignmentMerger      Processing SAM file(s): [aln/HG02142.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam]
WARNING 2020-02-24 09:02:55     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: ERR233223.95759674 2/2 101b aligned to 1:47264561-47264660. > ERR233223.14389540 2/2 101b aligned to 1:47264588-47264671.
INFO    2020-02-24 09:02:55     SamAlignmentMerger      Finished reading 28356 total records from alignment SAM/BAM.
[Mon Feb 24 09:02:55 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR233223.100003813!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ADD REPLYlink written 3 months ago by godth13teen10

I also tried using another sample, HG02088.mapped.ILLUMINA.bwa.KHV.low_coverage.20130415.bam (still from Human genome project), and it has the same problem, the warning is a bit different in those lines:

WARNING 2020-02-24 09:30:58     SamAlignmentMerger      Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: SRR741416.45399016 2/2 101b aligned to 1:47264572-47264658. > SRR741416.1601628 1/2 101b aligned to 1:47264588-47264688.
INFO    2020-02-24 09:30:58     SamAlignmentMerger      Finished reading 34465 total records from alignment SAM/BAM.
[Mon Feb 24 09:30:58 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: SRR741415.10010200!
ADD REPLYlink modified 3 months ago • written 3 months ago by godth13teen10

at this point I think you want MergeSamFiles, NOT MergeBamAlignment Difference Between Picards Mergebamalignment And Mergesamfiles

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum128k
1

Underlying iterator is not queryname sorted: SRR741416.45399016 2/2 101b aligned to 1:47264572-47264658. > SRR741416.1601628 1/2 101b aligned to 1:47264588-47264688.

so did you sort the unampped BAM on read-names (not coordinates). Beware, it should be sorted using picard SortSam (NOT samtools) because of : https://github.com/samtools/htsjdk/issues/766

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum128k

I downloaded the file from human genome project without any modification. I checked their documents, it seems like they sorted the mapped file:

 samtools view -bSu $sam_file | samtools sort -n -o - samtools_nsort_tmp | 
 samtools fixmate /dev/stdin /dev/stdout | samtools sort -o - samtools_csort_tmp |
 samtools fillmd -u - $reference_fasta > $fixed_bam_file

How can I sort the unmapped bam in the same manner?

ADD REPLYlink written 3 months ago by godth13teen10

try to use picard SortSam using the correct sorting 'queryname'

ADD REPLYlink written 3 months ago by Pierre Lindenbaum128k

I have tried two cases: 1, Sorting unmapped then MergeAlignment

INFO    2020-02-25 12:13:30     SamAlignmentMerger      Processing SAM file(s): [aln/HG01842.mapped.ILLUMINA.bwa.KHV.low_coverage.20120522.bam]
WARNING 2020-02-25 12:13:30     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
(...)
WARNING 2020-02-25 12:13:31     SAMSequenceDictionary   Found sequence entry for which tags differ: hs37d5 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
[Tue Feb 25 12:13:31 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR042514.10000846!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

2, Sorting both files then MergeAlignment

INFO    2020-02-25 12:11:56     SamAlignmentMerger      Processing SAM file(s): [HG            01842.mapped.sorted.bam]
WARNING 2020-02-25 12:11:56     SAMSequenceDictionary   Found sequence entry for which tags differ: 1 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz                    AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
(...)
WARNING 2020-02-25 12:11:57     SAMSequenceDictionary   Found sequence entry for which tags differ: hs37d5 and tag UR has the two values: ftp://ftp.1000genomes.ebi.ac            .uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz                    AS:NCBI37       SP:Human and ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz. Using ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz        AS:NCBI37       SP:Human
[Tue Feb 25 12:11:57 ICT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=886571008
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Reads remaining on alignment iterator: ERR042514.10000846!
        at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:558)
        at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
        at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:358)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

And it still output nothing

ADD REPLYlink written 3 months ago by godth13teen10

no it's not, as you can see I got one mapped file and one unmapped file, I need the MergeBamAlignment not the Mergesamfiles

ADD REPLYlink written 3 months ago by godth13teen10

oh, I'm wrong again sorry.

ADD REPLYlink written 3 months ago by Pierre Lindenbaum128k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1839 users visited in the last hour