tool to analysis bulk RNAseq data with UMIs
2
3
Entering edit mode
23 months ago
Sara ▴ 180

I have bulk RNAseq data and in the protocol, they also used UMIs. I am looking for a tools which is able to deal with UMIs in bulk RNAseq but did not find any (they are all made for single cell RNAseq). so, my question is that, is there any tool available to work with the bulk RNAseq data with UMIs?

next-gen • 2.2k views
3
Entering edit mode
23 months ago

UMI-tools can handle any UMI tagged sequencing data where deduplication happens after mapping https://umi-tools.readthedocs.io/en/latest/index.html.

The process is to extract the UMIs from the read sequence and add it to the read names. There are two ways to do this, and between them provide the flexibility to handle any read configuration I can think of (see https://umi-tools.readthedocs.io/en/latest/regex.html)

The next step depends on whether your technique fragments the cDNA before or after PCR. If fragmentation happens after PCR, then the next step is to assign reads to features (e.g. genes) using featureCounts. If PCR happened after fragmentation, then you do the read assignment/quantification after deduping.

Then you group/dedup/count (depending on your downstream application). If fragmentation happened after PCR then you need to do this on a per-gene basis.

0
Entering edit mode

Hi, I have collected my HTS data (single-end) of E.coli ribosome (full) using the Illumina platform. I found UMI-tools is very interesting and useful. I have used 18nt random barcode at 5'-end for avoiding the read duplication. I want to count the number of UMIs and reads at each position after mapping with a reference sequence. I have read the manual of UMI-tools, but couldn't figure out the solution: can you please suggest me how can I proceed. I'm providing an example showing what is my aim and how much I have understood:

Say, I have extracted the random barcode (18nt) from the 5'- end of each reads at the head ('_' seperated) like below using UMI-tools. Then I'll do mapping with the reference sequence using bowtie -2 . Now, I want to count the number of reads at each position of the reference and the barcodes which were unique to those reads from the SAM/BAM file. That means, I want to get the number of molecules at each position and their UMIs. For example, if I get 100 reads at 15th position and those 100 reads contained 75 types of unique barcodes, e.g., I want to get the number of reads (100) and unique barcodes (75) at each position (here 15th).

@ST-E00205:943:HCF3YCCX2:4:1101:11495:1678_CCAGCCCAAAGCCACCCG 1:N:0:NCCACGCG+NGATCTCG ACCGGATGGTAGACCTGGAGGAGGGGAAAGCCGAGGTGGTGACGGGAGCGGCTGGGGGGGGAGTCCGGGATGGTAGGCGGAGCGGGCAGAGCACAGCAGCTCGTGTAGAAATGG
+
7-<--7--7-7F-----77----7---7-------------------7----77-7-----7------7---------7-7------7--7----77----------77-7---

0
Entering edit mode

This is a separate question. Can you please start a new post.

0
Entering edit mode

Okay, thank you very much.

1
Entering edit mode
23 months ago

Have you looked at umi_tools?

0
Entering edit mode

@swbarnes2: I think umi_tools is only for scRNAseq, right?

1
Entering edit mode

scRNA data is still normal sequence. Depending on the scheme you are using for your UMI's you should be able to apply umi_tools. See the FAQ for examples of regular expressions you can use.

0
Entering edit mode

UMI-tools was actually first created to analyse iCLIP data! Absolutely no reason it shouldn't work with bulk RNA-seq, infact we are analysing some UMI-tagged bulk RNAseq data with it ourselves right now.

0
Entering edit mode

It pulls the UMI out of a read and puts it in the read name; That's not specific to scRNASeq