Question: Reordering Bam files based on chromosome number, when the chromosome format is chr1..etc
0
gravatar for jparker4
7 months ago by
jparker420
jparker420 wrote:

When trying to call variants I get the error message:

##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1328for more information. Error details: reads contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

Both my fasta file and my bams have the chromosomes sorted in this order. Therefore I need to re-sort my bams and possibly my fasta to be in numerical order: chr1,chr2 etc.

However, due to the chr in the chromosome name samtools sort does not sort the chromosomes into the desired order. Can anyone help with this sorting?

snp rna-seq • 440 views
ADD COMMENTlink modified 7 months ago by finswimmer11k • written 7 months ago by jparker420

It would be useful to mention what program you are using to call variants.

ADD REPLYlink written 7 months ago by genomax63k

Apologies, that was misleading. It is part of a variant calling pipeline but the actual error occurs when trying to run SplitNCigarReads. The full command I am running is posted below the comment underneath this one.

ADD REPLYlink written 7 months ago by jparker420
0
gravatar for finswimmer
7 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello,

in the error message there is a link to the documentation about what is wrong and how you can solve this.

Version-specific alert for GATK 3.5

In version 3.5, we added some beefed-up VCF sequence dictionary validation. Unfortunately, as a side effect of the additional checks, some users have experienced an error that starts with "ERROR MESSAGE: Lexicographically sorted human genome sequence detected in variant." that is due to unintentional activation of a check that is not necessary. This will be fixed in the next release; in the meantime -U ALLOW_SEQ_DICT_INCOMPATIBILITY can be used (with caution) to override the check.

fin swimmer

(Aargh, third edit)

ADD COMMENTlink modified 7 months ago • written 7 months ago by finswimmer11k

Hi, thanks for the reply. Unfortunately the error is occurring when I run SplitNCigarReads, which requires the -U option ALLOW_N_CIGAR_READS. The tool will not let me pass two -U options in, so I'm a bit stuck.

Below is the error causing command:

java -Xmx10G -Djava.io.tmpdir=/fastdata/mbp15jdp/tmp -jar ~/Downloads/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar                    -T SplitNCigarReads                     -R /shared/sudlab1/General/mirror/genomes/hisat/hg19.fa                    -I deduped.dir/GTEX-12WSH-Cerebellum.hisat.bam                     -o split.dir/GTEX-12WSH-Cerebellum.hisat.split.bam                     -rf ReassignOneMappingQuality                     -RMQF 255                     -RMQT 60                     -U ALLOW_N_CIGAR_READS
ADD REPLYlink modified 7 months ago by genomax63k • written 7 months ago by jparker420

Hello,

ok than we have to resort your contig (that was my first solution before I saw the above information) even if I have no idea why GATK cares about the current order.

1. Sort your reference

$ samtools faidx reference.fa
$ cut -f1 reference.fa.fai|sort -k1V|parallel -k 'samtools faidx reference.fa {}' > ref_sorted.fa

2. Create a dictionary for the sorted ref

$ java -jar picard.jar CreateSequenceDictionary R=ref_sorted.fa O=ref_sorted.dict

3. Reorder bam file

$ java -jar picard.jar ReorderSam I=original.bam O=reordered.bam R=ref_sorted.fa CREATE_INDEX=TRUE

fin swimmer

ADD REPLYlink modified 7 months ago • written 7 months ago by finswimmer11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1764 users visited in the last hour