Formating RNA-seq with UMIs in unmapped BAM files to UMI-tools compatible FASTQ files
0
0
Entering edit mode
5.1 years ago
graeme.thorn ▴ 100

I'm about to take delivery of large numbers of unmapped (demultiplexed) BAM files from a BGI sequencer with pairs of reads in the following form:

A1  77  *   0   0   *   *   0   0   <sequence>  <quality>   RG:Z:rL1    RX:Z:GCGCCCC    QX:Z:B(,:.)'    BC:Z:GTCTAAACAG QT:Z:.',(/-+*;&
A1  141 *   0   0   *   *   0   0   <sequence>  <quality>   RG:Z:rL1    RX:Z:GCGCCCC    QX:Z:B(,:.)'    BC:Z:GTCTAAACAG QT:Z:.',(/-+*;&

The barcodes (BC:/QT:) and the unique molecular indices (RX:/QX:) have already been removed from the reads and deposited in the relevant fields of the BAM file.

I need to process this into FASTQ format for the paired ends (flags 77 and 141 for a mate pair) to eventually use UMI-tools on the mapped data.

Is there a script anywhere that will take an unmapped BAM file in this format and turn it into a FASTQ of the format required to map before deduplicating through UMI-tools? I can brew my own, but if there's an off-the-shelf solution I could use, then I'd be grateful.

RNA-Seq umi pre-processing • 2.2k views
ADD COMMENT
2
Entering edit mode

If you can find a mapper that will handle the unmapped BAM as input, UMI-tools is more than happy to take the library barcodes and UMI sequences from a BAM tag, rather then the read name.

ADD REPLY
0
Entering edit mode

Thanks, just spotted those options --umi-tag and --extract-umi-method=tag. STAR can use unmapped BAM files as input and (according to its docs) retains all tags when mapping so it can be used prior to that. Now just to find a solution to clip/trim the unmapped BAM before trying to align.

ADD REPLY
1
Entering edit mode

Now just to find a solution to clip/trim the unmapped BAM before trying to align

STAR should soft clip during alignment.

ADD REPLY
0
Entering edit mode

Is this 10x data?

ADD REPLY
0
Entering edit mode

I'm not aware that it is, just that it will be in the above form - that is taken from an initial run with some known other samples so I can get familiarised with the format and develop pipelines before it arrives.

ADD REPLY

Login before adding your answer.

Traffic: 3314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6