Question: Formating RNA-seq with UMIs in unmapped BAM files to UMI-tools compatible FASTQ files
0
gravatar for graeme.thorn
25 days ago by
graeme.thorn40
London, United Kingdom
graeme.thorn40 wrote:

I'm about to take delivery of large numbers of unmapped (demultiplexed) BAM files from a BGI sequencer with pairs of reads in the following form:

A1  77  *   0   0   *   *   0   0   <sequence>  <quality>   RG:Z:rL1    RX:Z:GCGCCCC    QX:Z:B(,:.)'    BC:Z:GTCTAAACAG QT:Z:.',(/-+*;&
A1  141 *   0   0   *   *   0   0   <sequence>  <quality>   RG:Z:rL1    RX:Z:GCGCCCC    QX:Z:B(,:.)'    BC:Z:GTCTAAACAG QT:Z:.',(/-+*;&

The barcodes (BC:/QT:) and the unique molecular indices (RX:/QX:) have already been removed from the reads and deposited in the relevant fields of the BAM file.

I need to process this into FASTQ format for the paired ends (flags 77 and 141 for a mate pair) to eventually use UMI-tools on the mapped data.

Is there a script anywhere that will take an unmapped BAM file in this format and turn it into a FASTQ of the format required to map before deduplicating through UMI-tools? I can brew my own, but if there's an off-the-shelf solution I could use, then I'd be grateful.

rna-seq umi pre-processing • 150 views
ADD COMMENTlink modified 25 days ago • written 25 days ago by graeme.thorn40
2

If you can find a mapper that will handle the unmapped BAM as input, UMI-tools is more than happy to take the library barcodes and UMI sequences from a BAM tag, rather then the read name.

ADD REPLYlink written 25 days ago by i.sudbery4.3k

Thanks, just spotted those options --umi-tag and --extract-umi-method=tag. STAR can use unmapped BAM files as input and (according to its docs) retains all tags when mapping so it can be used prior to that. Now just to find a solution to clip/trim the unmapped BAM before trying to align.

ADD REPLYlink written 24 days ago by graeme.thorn40
1

Now just to find a solution to clip/trim the unmapped BAM before trying to align

STAR should soft clip during alignment.

ADD REPLYlink modified 24 days ago • written 24 days ago by genomax65k

Is this 10x data?

ADD REPLYlink written 25 days ago by genomax65k

I'm not aware that it is, just that it will be in the above form - that is taken from an initial run with some known other samples so I can get familiarised with the format and develop pipelines before it arrives.

ADD REPLYlink written 25 days ago by graeme.thorn40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour