Removing umi split off as a separate fastq (RNA-seq)
1
0
Entering edit mode
16 months ago
YJ • 0

Hi,

I used x-gen udi/umi adaptors from idt to generate my RNA-seq samples and ran my single end RNA-seq experiment. I received two fastq files for each sample: R1 from my 100bp SE run and R2 for 9bp UMI sequence split off. I normally analyze my RNA-seq experiments using STAR aligner to transcriptome and expression calculation with RSEM. I would like to incorporate umi-based deduplication into this step.

I've tried a few methods.

  1. I ignored R2 and used umi_tools extract with --bc-pattern NNNNNNNNN as instructed on the website and followed up with STAR alignment and umi dedup. In this case, I obtained deduplicated files but my file size was reduced to 1/20 of original size.
  2. I tried to convert my R1 fastq file into unmapped bam by using picard fastqtosam function. I incorporated UMIs from fastq by using fgbio annotatebamwithumis function. I converted ubam with UMIs marked with RX back to fastq and at this point I was able to see all my UMIs tagged with RX in bam file. Then I proceeded with STAR alignment to transcriptome. After alignment, I ran umi dedup with command --extract-umi-method=tag, --tag=RX. However, then I get a warning message that at least one read is missing umi and/or cell tag and I'm left with much smaller file size compared to original bam file.

Does anyone have a experience with this situation? I guess I can also try picard markduplicates with REMOVE DUPLICATES=TRUE option instead of umi dedup, but I'm concerned that I'm losing a big chunk of file. I would like to stick to already established STAR-RSEM pipeline as much as possible. I would appreciate any help! Thank you very much in advance!

umi_tools RNA-seq STAR • 1.2k views
ADD COMMENT
0
Entering edit mode

For specialized kits like this you should follow the recommendations from IDT to analyze the data (Appendix G Here) You may be doing this already but wanted to check.

ADD REPLY
0
Entering edit mode

Interestingly that manual doesn't mention UMIs.....

ADD REPLY
0
Entering edit mode

They are extended adapters that can be used with xGEN RNA kit I think: https://www.idtdna.com/pages/support/faqs/how-do-i-sequence-the-umi-in-the-xgen-udi-umi-adapters

ADD REPLY
1
Entering edit mode
16 months ago

Here is what I would do:

You need to use umi_tools to extract the UMI from read2 and add it to the header of read1. To do this use umi_tools extract like follows:

$ umi_tools extract --stdin=R2.fastq.gz --read2-in=R1.fastq.gz --stdout=discard.fastq --read2-out=map_this_one.fastq.gz --bc-pattern=NNNNNNNNN

You can then follow up with STAR and umi_tools dedup. Note that you will see a decrease in the size of your BAM as duplicates are removed.

Alternatively, your second approach should have worked, but the option is --umi-tag not --tag. You can find reads that don't have an umi tag as follow:

$ samtools view mapped_reads.bam | grep -v 'RX:'
ADD COMMENT

Login before adding your answer.

Traffic: 2791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6