Question: Sorting Bam File For Gatk Read-Pair Walkers
1
gravatar for Pascal
7.1 years ago by
Pascal1.4k
Barcelona
Pascal1.4k wrote:

Hi

I am trying to run a read pair GATK walker on a bam file (e.g. CountPairs). I thought it was an easy thing but I'm trapped with issues related with bam file sorting and indexing.

If I understood well the pipeline should be:

  1. sorting the bam file using picard SortSam tool with "queryname" as a sort order,
  2. indexing the resulting bam file with samtools index
  3. running GATK walker on resulting bam file

But the step 2. is failing with the following error:

[bam_index_core] the alignment is not sorted (SRR003480.10000060): 23501638 > 19966740 in 22-th chr
[bam_index_build2] fail to index the BAM file.

I tried the pipeline using the sort order coordinate. In this case samtools works well but GATK complains it can only process "queryname" ordered file.

Both "pipelines" of commands come after my signature, if you want to have a look.

Apart from the toubleshooting I would really appreciate some explanation on the meaning of these sort orders. Thanks in advance for any help!

BTW, I know there is a similar post here, I tried to use AddOrReplaceReadGroups instead of SamSort but it didn't help.

Regards, Pascal

Using SO=queryname

$ java -Xmx1024m -jar ~/tools/picard/SortSam.jar I=NA11992.chrom22.ILLUMINA.bwa.CEU.exon_targetted.20100311.bam O=NA11992_sorted_coordinate.bam SO=queryname
$ samtools index NA11992_sorted_queryname.bam 
[bam_index_core] the alignment is not sorted (SRR003480.10000060): 23501638 > 19966740 in 22-th chr
[bam_index_build2] fail to index the BAM file.

Using SO=coordinate

$ java -Xmx1024m -jar ~/tools/picard/SortSam.jar I=NA11992.chrom22.ILLUMINA.bwa.CEU.exon_targetted.20100311.bam O=NA11992_sorted_coordinate.bam SO=coordinate
$ samtools index NA11992_sorted_coordinate.bam
$ java -Xmx2g -jar ../../gatk/GenomeAnalysisTK-1.2-60-g585a45b/GenomeAnalysisTK.jar -R ../references/human_g1k_v37.fasta -T CountPairs -o output.txt -I NA11992_sorted_coordinate.bam
[...]
##### ERROR MESSAGE: Missorted Input SAM/BAM files: files are not sorted in queryname order; Read pair walkers can only walk over query name-sorted data.  Please resort your input BAM file.
gatk bam samtools sam sort • 4.8k views
ADD COMMENTlink written 7.1 years ago by Pascal1.4k

Something that is not very clear reading GATK FAQ: it clearly states that the bam file "must be sorted in coordinate order (not by queryname and not unsorted)". But when I run GATK against such a file it complains: "Missorted Input SAM/BAM files: files are not sorted in queryname order" ?! This is not consistent, isn't it?

ADD REPLYlink written 7.1 years ago by Pascal1.4k
2
gravatar for Bpow
6.8 years ago by
Bpow200
United States
Bpow200 wrote:

In order to run a ReadPairWalker in GATK, the SORT_ORDER has to be 'queryname'.

However, there is a universal warning in GATK against processing a bam file that is not coordinate sorted, so you have to specify "unsafe" mode by adding the "-U" option to your command line.

ADD COMMENTlink written 6.8 years ago by Bpow200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1237 users visited in the last hour