Question: How can I use BBMap reformat.sh to correctly retrieve fastq quality scores from a bam file?
0
gravatar for thequokka
25 days ago by
thequokka0
thequokka0 wrote:

Hi,

I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

Here's some lines from the bam file:

HISEQ:378:C7F64ANXX:3:1207:13039:83924  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  15  60  125M    =   644 754 GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA   bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0
HISEQ:378:C7F64ANXX:3:1106:10647:86342  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  30  60  125M    =   669 764 ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG   aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0
HISEQ:378:C7F64ANXX:3:1208:17933:95359  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  32  60  125M    =   620 713 ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA   aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef   NM:i:11 MD:Z:14T2C9A1C23T8A0C11G3G23G10G10  MC:Z:125M   AS:i:70 XS:i:0
HISEQ:378:C7F64ANXX:3:2305:17850:4846   97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  52  60  125M    =   625 690 ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG   abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef   NM:i:9  MD:Z:7A1C23T8A0C11G3G23G10G30   MC:Z:117M8S AS:i:80 XS:i:0
HISEQ:378:C7F64ANXX:3:2205:15096:32122  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  59  60  125M    =   675 741 AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT   `aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO   NM:i:2  MD:Z:90T5A28    MC:Z:125M   AS:i:115    XS:i:0
HISEQ:378:C7F64ANXX:3:1307:17567:99979  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  68  60  125M    =   718 775 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC   ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0
HISEQ:378:C7F64ANXX:3:2211:20485:88833  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  68  60  125M    =   654 711 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC   aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0
HISEQ:378:C7F64ANXX:3:1212:7300:28109   97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  71  60  125M    =   636 690 CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC   bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff   NM:i:7  MD:Z:14T8A0C11G3G23G10G49   MC:Z:125M   AS:i:90 XS:i:0
HISEQ:378:C7F64ANXX:3:2303:16430:40702  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  72  60  125M    =   694 747 TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC   abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0
HISEQ:378:C7F64ANXX:3:2116:16496:33002  97  smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues  95  60  125M    =   722 752 CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC   bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0

Here's the command and STDERR (NB in this case I was using qin=64 qout=33 for troubleshooting but I get the same result without these flags):

/home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz]

Could not find sambamba.
Found samtools 1.8
Input is being processed as unpaired
Input:                      464 reads           58000 bases
Output:                     230 reads (49.57%)  28750 bases (49.57%)

Time:                           0.634 seconds.
Reads Processed:         464    0.73k reads/sec
Bases Processed:       58000    0.09m bases/sec[

Here's some of the output:

@HISEQ:378:C7F64ANXX:3:2205:16922:87749
CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HISEQ:378:C7F64ANXX:3:1210:20568:23121
AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HISEQ:378:C7F64ANXX:3:2212:11893:40357
GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HISEQ:378:C7F64ANXX:3:2210:7117:7877
GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

Thanks for any advice.

Cheers.

ADD COMMENTlink written 25 days ago by thequokka0

I wonder if that unnaturally long smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues name is causing a problem.

You should also see if primaryonly=t helps.

ADD REPLYlink modified 25 days ago • written 25 days ago by genomax73k

I tried your suggestions but am still getting the same results. Things work fine for bam/sam files with q+33 quality scores but not for q+64 scores. Basically I believe this is a bug. java --version output:

openjdk 12.0.1 2019-04-16
OpenJDK Runtime Environment (build 12.0.1+12)
OpenJDK 64-Bit Server VM (build 12.0.1+12, mixed mode, sharing)

Some test data:

@HD VN:1.6  SO:coordinate
@SQ SN:kp_RWP_minus_insertion   LN:4810
@PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:/scratch//bwa-0.7.17/bwa-0.7.17/bwa mem /scratch/input/index/kp_RWP_minus_insertion.fa -t 8 -M -p -
HISEQ:378:C7F64ANXX:3:1210:6785:19474   97  kp_RWP_minus_insertion  13  60  125M    =   329 437 GGGGGGGAGTGATAAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATT   bbbbbdfffffffffaffffdffffffffffffdfdffffffffffffdfffffffffffffffffffffffdffffdffdfffffffffffffffffffbbffcbdcfffffffffffffffdf   NM:i:10 MD:Z:33T2C9A1C23T8A0C11G3G23G2  MC:Z:3S20M1D92M2I8M AS:i:77 XS:i:0
HISEQ:378:C7F64ANXX:3:2107:8631:14257   97  kp_RWP_minus_insertion  14  60  125M    =   365 476 GGGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTG   bbbbbfffeffffffffffffffffffffffeffffcfeffefbcbeefbeefffffffffffffffffffffffffdffffffNcfffffffffffffdfffdfdffffffffffdfaf\cW_b   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0
HISEQ:378:C7F64ANXX:3:1206:7024:39152   161 kp_RWP_minus_insertion  18  60  125M    =   329 413 GGAGTGATAAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAG   aa`aafffffffffffffffffffffffffffffdffffffffffffffffffffffffdbcfffffffffff]fffeffffffffdfffffdfffafefffffadefdfffffeef_Oef]f_e   NM:i:10 MD:Z:28T2C9A1C23T8A0C11G3G23G7  MC:Z:24S20M1D81M    AS:i:75 XS:i:0
HISEQ:378:C7F64ANXX:3:2114:8479:91657   161 kp_RWP_minus_insertion  18  60  125M    =   329 425 GGAGTGATAAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAG   `_``[effffefffeffffffffffdefffeceee]efecdccffeffffffffeffffffefffffdfffffffdfcfffffffffffffd]efc\eYefdOdcefZe_efacfffffdbedff   NM:i:10 MD:Z:28T2C9A1C23T8A0C11G3G23G7  MC:Z:8S20M1D93M4S   AS:i:75 XS:i:0
HISEQ:378:C7F64ANXX:3:1206:2070:48041   97  kp_RWP_minus_insertion  21  60  125M    =   329 409 GTGATAAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGG   bbbbbffffffeffffffffffffffffffffffffffffffffffffdfffffffffffffffffffffffffffdffffffbfdeffffffffffeedfffffebcffdffffdffffdeeef   NM:i:10 MD:Z:25T2C9A1C23T8A0C11G3G23G10 MC:Z:25S20M1D80M    AS:i:75 XS:i:0
HISEQ:378:C7F64ANXX:3:2210:5304:50132   97  kp_RWP_minus_insertion  21  60  125M    =   329 409 GTGATAAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGG   bbbabfffffffffbfffffffffffffffeffffffffffffffffffffffffffffffff_ffffffffffYefebffffffffffffffffeefffdaffeffffffffffffff]fffff   NM:i:10 MD:Z:25T2C9A1C23T8A0C11G3G23G10 MC:Z:25S20M1D80M    AS:i:75 XS:i:0
HISEQ:378:C7F64ANXX:3:2112:14989:9630   97  kp_RWP_minus_insertion  23  60  125M    =   329 420 GATAAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAA   bbbbbfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffbefefffffffdfffffffefff   NM:i:11 MD:Z:23T2C9A1C23T8A0C11G3G23G10G1   MC:Z:7S20M1D93M5S   AS:i:73 XS:i:0
HISEQ:378:C7F64ANXX:3:2216:15747:2928   161 kp_RWP_minus_insertion  26  60  125M    =   329 409 AAAAATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTT   aababffffffffffffffffffffe_ffbfeefcffffffffeffffffffeffffffdffffffffffffffffffffffffffffaeafffffffffcfffedeffffdbefffOeffdfff   NM:i:11 MD:Z:20T2C9A1C23T8A0C11G3G23G10G4   MC:Z:20S20M1D85M    AS:i:70 XS:i:0
HISEQ:378:C7F64ANXX:3:2216:1981:31620   161 kp_RWP_minus_insertion  30  60  125M    =   331 425 ATATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAG   bbabbffffffffffffbbfffffffffffffffffffffdffdffffffffffbffffffffdffbbcffffffffffffdfffffeffffafffffdffffffffff_ec]\_eOefdfcfff   NM:i:11 MD:Z:16T2C9A1C23T8A0C11G3G23G10G8   MC:Z:18M1D92M2I13M  AS:i:70 XS:i:0
HISEQ:378:C7F64ANXX:3:1109:14994:72804  161 kp_RWP_minus_insertion  33  60  125M    =   394 486 TATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAAT   aaabbffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffffdfffffffffffffffffffeefffffffffffffffffffffeff   NM:i:0  MD:Z:125    MC:Z:125M   AS:i:125    XS:i:0

Thanks for your help.

ADD REPLYlink modified 11 days ago • written 11 days ago by thequokka0

You can either create an issue on BBMap site and try emailing Brian Bushnell directly about this (you will find his email address in in-line help for any BBMap program).

Perhaps it is biostars site code but I am not able to make your data work with reformat.sh so was not able to try myself.

Update 1: You have an extra -t 8 -M -p - hyphen at end of @PG line. After taking that out I can confirm that the output fastq gets all J as quality scores.

Update 2: You can use samtools fastq for doing the conversion for now. That seems to work fine.

ADD REPLYlink modified 10 days ago • written 11 days ago by genomax73k

Ok. Thanks genomax. Submitted a ticket to Brian. Strange that the extra command options cause a problem for you. The extra hyphen is there due to piping in from seqtk mergepe/STDOUT .

ADD REPLYlink written 10 days ago by thequokka0

It may have been a non-printable character that came along since I just copied and pasted the example into a file. I see that Brian has acknowledged your ticket so hopefully he will fix the issue in a future release.

ADD REPLYlink written 10 days ago by genomax73k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1368 users visited in the last hour