I'm about to take delivery of large numbers of unmapped (demultiplexed) BAM files from a BGI sequencer with pairs of reads in the following form:
A1 77 * 0 0 * * 0 0 <sequence> <quality> RG:Z:rL1 RX:Z:GCGCCCC QX:Z:B(,:.)' BC:Z:GTCTAAACAG QT:Z:.',(/-+*;& A1 141 * 0 0 * * 0 0 <sequence> <quality> RG:Z:rL1 RX:Z:GCGCCCC QX:Z:B(,:.)' BC:Z:GTCTAAACAG QT:Z:.',(/-+*;&
The barcodes (BC:/QT:) and the unique molecular indices (RX:/QX:) have already been removed from the reads and deposited in the relevant fields of the BAM file.
I need to process this into FASTQ format for the paired ends (flags 77 and 141 for a mate pair) to eventually use UMI-tools on the mapped data.
Is there a script anywhere that will take an unmapped BAM file in this format and turn it into a FASTQ of the format required to map before deduplicating through UMI-tools? I can brew my own, but if there's an off-the-shelf solution I could use, then I'd be grateful.