Entering edit mode
4.5 years ago
thequokka
•
0
Hi,
I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Here's some lines from the bam file:
HISEQ:378:C7F64ANXX:3:1207:13039:83924 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 15 60 125M = 644 754 GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[ NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
HISEQ:378:C7F64ANXX:3:1106:10647:86342 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 30 60 125M = 669 764 ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
HISEQ:378:C7F64ANXX:3:1208:17933:95359 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 32 60 125M = 620 713 ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef NM:i:11 MD:Z:14T2C9A1C23T8A0C11G3G23G10G10 MC:Z:125M AS:i:70 XS:i:0
HISEQ:378:C7F64ANXX:3:2305:17850:4846 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 52 60 125M = 625 690 ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef NM:i:9 MD:Z:7A1C23T8A0C11G3G23G10G30 MC:Z:117M8S AS:i:80 XS:i:0
HISEQ:378:C7F64ANXX:3:2205:15096:32122 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 59 60 125M = 675 741 AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT `aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO NM:i:2 MD:Z:90T5A28 MC:Z:125M AS:i:115 XS:i:0
HISEQ:378:C7F64ANXX:3:1307:17567:99979 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 68 60 125M = 718 775 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
HISEQ:378:C7F64ANXX:3:2211:20485:88833 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 68 60 125M = 654 711 GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
HISEQ:378:C7F64ANXX:3:1212:7300:28109 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 71 60 125M = 636 690 CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff NM:i:7 MD:Z:14T8A0C11G3G23G10G49 MC:Z:125M AS:i:90 XS:i:0
HISEQ:378:C7F64ANXX:3:2303:16430:40702 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 72 60 125M = 694 747 TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
HISEQ:378:C7F64ANXX:3:2116:16496:33002 97 smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues 95 60 125M = 722 752 CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff NM:i:0 MD:Z:125 MC:Z:125M AS:i:125 XS:i:0
Here's the command and STDERR (NB in this case I was using qin=64 qout=33 for troubleshooting but I get the same result without these flags):
/home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz]
Could not find sambamba.
Found samtools 1.8
Input is being processed as unpaired
Input: 464 reads 58000 bases
Output: 230 reads (49.57%) 28750 bases (49.57%)
Time: 0.634 seconds.
Reads Processed: 464 0.73k reads/sec
Bases Processed: 58000 0.09m bases/sec[
Here's some of the output:
@HISEQ:378:C7F64ANXX:3:2205:16922:87749
CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HISEQ:378:C7F64ANXX:3:1210:20568:23121
AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HISEQ:378:C7F64ANXX:3:2212:11893:40357
GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@HISEQ:378:C7F64ANXX:3:2210:7117:7877
GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Thanks for any advice.
Cheers.
I wonder if that unnaturally long
smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues
name is causing a problem.You should also see if
primaryonly=t
helps.I tried your suggestions but am still getting the same results. Things work fine for bam/sam files with q+33 quality scores but not for q+64 scores. Basically I believe this is a bug. java --version output:
Some test data:
Thanks for your help.
You can either create an issue on BBMap site and try emailing Brian Bushnell directly about this (you will find his email address in in-line help for any BBMap program).
Perhaps it is biostars site code but I am not able to make your data work with reformat.sh so was not able to try myself.
Update 1:
You have an extraI can confirm that the output fastq gets all-t 8 -M -p -
hyphen at end of@PG
line. After taking that outJ
as quality scores.Update 2: You can use
samtools fastq
for doing the conversion for now. That seems to work fine.Ok. Thanks genomax. Submitted a ticket to Brian. Strange that the extra command options cause a problem for you. The extra hyphen is there due to piping in from seqtk mergepe/STDOUT .
It may have been a non-printable character that came along since I just copied and pasted the example into a file. I see that Brian has acknowledged your ticket so hopefully he will fix the issue in a future release.