Sorting Bam File For Gatk Read-Pair Walkers
1
1
Entering edit mode
12.4 years ago
Pascal ★ 1.5k

Hi

I am trying to run a read pair GATK walker on a bam file (e.g. CountPairs). I thought it was an easy thing but I'm trapped with issues related with bam file sorting and indexing.

If I understood well the pipeline should be:

  1. sorting the bam file using picard SortSam tool with "queryname" as a sort order,
  2. indexing the resulting bam file with samtools index
  3. running GATK walker on resulting bam file

But the step 2. is failing with the following error:

[bam_index_core] the alignment is not sorted (SRR003480.10000060): 23501638 > 19966740 in 22-th chr
[bam_index_build2] fail to index the BAM file.

I tried the pipeline using the sort order coordinate. In this case samtools works well but GATK complains it can only process "queryname" ordered file.

Both "pipelines" of commands come after my signature, if you want to have a look.

Apart from the toubleshooting I would really appreciate some explanation on the meaning of these sort orders. Thanks in advance for any help!

BTW, I know there is a similar post here, I tried to use AddOrReplaceReadGroups instead of SamSort but it didn't help.

Regards, Pascal

Using SO=queryname

$ java -Xmx1024m -jar ~/tools/picard/SortSam.jar I=NA11992.chrom22.ILLUMINA.bwa.CEU.exon_targetted.20100311.bam O=NA11992_sorted_coordinate.bam SO=queryname
$ samtools index NA11992_sorted_queryname.bam 
[bam_index_core] the alignment is not sorted (SRR003480.10000060): 23501638 > 19966740 in 22-th chr
[bam_index_build2] fail to index the BAM file.

Using SO=coordinate

$ java -Xmx1024m -jar ~/tools/picard/SortSam.jar I=NA11992.chrom22.ILLUMINA.bwa.CEU.exon_targetted.20100311.bam O=NA11992_sorted_coordinate.bam SO=coordinate
$ samtools index NA11992_sorted_coordinate.bam
$ java -Xmx2g -jar ../../gatk/GenomeAnalysisTK-1.2-60-g585a45b/GenomeAnalysisTK.jar -R ../references/human_g1k_v37.fasta -T CountPairs -o output.txt -I NA11992_sorted_coordinate.bam
[...]
##### ERROR MESSAGE: Missorted Input SAM/BAM files: files are not sorted in queryname order; Read pair walkers can only walk over query name-sorted data.  Please resort your input BAM file.
gatk samtools sam bam sort • 6.8k views
ADD COMMENT
0
Entering edit mode

Something that is not very clear reading GATK FAQ: it clearly states that the bam file "must be sorted in coordinate order (not by queryname and not unsorted)". But when I run GATK against such a file it complains: "Missorted Input SAM/BAM files: files are not sorted in queryname order" ?! This is not consistent, isn't it?

ADD REPLY
2
Entering edit mode
12.1 years ago
Bpow ▴ 280

In order to run a ReadPairWalker in GATK, the SORT_ORDER has to be 'queryname'.

However, there is a universal warning in GATK against processing a bam file that is not coordinate sorted, so you have to specify "unsafe" mode by adding the "-U" option to your command line.

ADD COMMENT

Login before adding your answer.

Traffic: 2289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6