Gatk Unifiedgenotyper Error:Error: The Number Of Base Qualities Does Not Match The Number Of Bases In M_Solexa-Ga04_Jk_Pe_Sl19_Repeat:6:56:1719:1324
2
0
Entering edit mode
12.2 years ago
Lds ▴ 450

Hi all,

When I used GATK UnifiedGenotyper to call variants from three Neanderthal BAM files, there was an error message for the calling,

    $ java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R human_ucsc_hg19.fasta -I SLVi33.16.hg19.sorted.bam -I SLVi33.25.hg19.sorted.bam -I SLVi33.26.hg19.sorted.bam -o snps.calling.vcf

    ##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.2-2-g8143def): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: SAM/BAM file SAMFileReader{SLVi33.16.hg19.sorted.bam} is malformed: Error: the number of base qualities does not match the number of bases in M_SOLEXA-GA04_JK_PE_SL19_repeat:6:56:1719:1324.
##### ERROR ------------------------------------------------------------------------------------------

    $ samtools view SLVi33.16.hg19.sorted.bam | grep "M_SOLEXA-GA04_JK_PE_SL19_repeat:6:56:1719:1324"
    M_SOLEXA-GA04_JK_PE_SL19_repeat:6:56:1719:1324    0    chr1    74071    0    53M    *    0    0    CACCTATGAGTGAGAATATGCGGTGTTTGGTTTTTTGTTCTTGCGATAGTTTA    *    RG:Z:SLVi33.16    UQ:i:0

So, my question is, how can I fix this problem? Meanwhile, does anyone call the Neanderthal SNPs from BAM files downloaded from UCSC, ftp://hgdownload.cse.ucsc.edu/gbdb/hg19/neandertal/seqAlis/SLVi33.16.hg19.bam ftp://hgdownload.cse.ucsc.edu/gbdb/hg19/neandertal/seqAlis/SLVi33.25.hg19.bam ftp://hgdownload.cse.ucsc.edu/gbdb/hg19/neandertal/seqAlis/SLVi33.26.hg19.bam

However, maybe someone has the better protocol to call these Neanderthal SNPs. Thanks in advance.

gatk • 4.0k views
ADD COMMENT
0
Entering edit mode

does any line of your bam file contain qualities at all?

ADD REPLY
0
Entering edit mode

Yes, most of the lines in the BAM file contain qualities for each bases at the line.

ADD REPLY
2
Entering edit mode
12.2 years ago
Doctoroots ▴ 800

A good place to post questions and find answers regarding GATK is their support forum : GSA get satisfaction

i found some related questions to your problem, this one has a suggested solution:

UnifiedGenotyper over BAM files without base qualities

and heres another question dealing with this issue

Error: the number of base qualities does not match the number of bases in

if both of these are not helpful, consider posting your question there. (and update the answer here in BioStar)

ADD COMMENT
1
Entering edit mode
12.2 years ago
Johan ▴ 890

If you don't have the possibility of regenerating the alignment data you will want to look into a way of sorting out the malformated reads.

One way of doing this, if you are comfortable with modifying the source of GATK, is to build your own version of the UnifiedGenotyper where you add the MalformedReadFilter to the list filters to use. It should look something like this: @ReadFilters( {BadMateFilter.class, MappingQualityUnavailableFilter.class, MalformedReadFilter.class} )

Hopefully that will filter out any reads with these problems.

For instructions on how to develop the GATK have a look at: http://www.broadinstitute.org/gsa/wiki/index.php/GATK_Development

ADD COMMENT

Login before adding your answer.

Traffic: 1878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6