Bulk RNA-Seq pipeline suggestion incorporating UMIs
3
3
Entering edit mode
4.3 years ago
steveh ▴ 70

Hi,

I have 149 bulk RNA-Seq samples (100 bp, paired-end, Illumina) which have come from sequencing in the form of fastq triplets, i.e. pairs of reads plus a third fastq which contains only UMIs.

My first question is - do I need to use the UMIs at all or just ignore them?

So far I've ignored them, and used this workflow (on just 10 samples to begin with):

  1. FastQC on raw reads
  2. Align to full ref human genome using STAR (in: fastq, out: BAMs, sortedByCoord)
  3. Produce counts using featureCounts
  4. MultiQC on results produced so far
  5. Analyse using DESeq2

This works, but ignores the UMIs completely. Results from multiQC show, from STAR:

star-alignment-plot

and from featureCounts:

feature-Counts-assignment-plot

(note the fairly large percentages of unassigned multimapping reads there).

Alternatively, I've tried to incorporate the UMIs with this changed workflow:

  1. FastQC on raw reads
  2. Align to full ref human genome using STAR (in: fastq, out: BAMs, sortedByCoord)
  3. Add the UMIs from the fastq files to the BAMs produced by STAR, using fgbio’s AnnotateBamWithUmis

but I'm getting lost down a rabit-hole now, adding more and more steps to this pipeline just in order to satisfy various errors I'm getting from downstream tools, e.g.

  1. fgbio SortBam
  2. fgbio SetMateInformation
  3. fgbio GroupReadsByUmi
  4. fgbio CallMolecularConsensusReads
  5. samtools rehead, to add SM tag to BAMs
  6. fgbio FilterConsensusReads (results in vastly reduced BAM file sizes)

for the moment I've stopped here - maybe I can use these BAM files, but this workflow is starting to feel over-complicated and I don't have confidence it's the correct way to go.

So to summarise:

  • Do I need to incorporate the UMIs at all?
  • If so, could anybody suggest a workflow?

Many thanks, Steve

RNA-Seq bulk UMI • 3.2k views
ADD COMMENT
0
Entering edit mode

Can not see images,.

ADD REPLY
0
Entering edit mode

apologies, corrected now

ADD REPLY
0
Entering edit mode

Have you tried to de-duplicate reads using UMI's alone or in combination with read alignment starts using umi_tools?

ADD REPLY
5
Entering edit mode
4.3 years ago

Here is what I would recommend with umi-tools.

Extract the UMIs from the fastqs before mapping. You'll need to do this once for each of the non-UMI reads.

umi_tools extract --bc-pattern=NNNNNNNNNN -I umi_reads.fastq.gz --read2s-in=reads_R1.fastq.gz --read2-stdout | gzip > reads_R1.extracted.fastq.gz
umi_tools extract --bc-pattern=NNNNNNNNNN -I umi_reads.fastq.gz --read2s-in=reads_R2.fastq.gz --read2-stdout | gzip > reads_R2.extracted.fastq.gz

where the number of Ns in the bc-pattern matches the number of bases in the UMI.

You can then proceed to map these reads using STAR as before.

Once the reads are mapped, sorted and indexed, deduplicate the BAMs with umi_tools dedup:

umi_tools dedup -I mapped_reads.bam -S deduplicated_reads.bam --paired

Now you can proceed to quantify with featureCounts and analyse with Deseq2 as before.

ADD COMMENT
0
Entering edit mode

Thanks Ian - would that be sorted by coordinate? (asking because the fgbio workflow seems to require re-sorting by Queryname)

ADD REPLY
0
Entering edit mode

Yes, sorted by coordinate.

ADD REPLY
0
Entering edit mode

thanks - and for the dedup step, do I need the --paired option or is that assumed?

ADD REPLY
0
Entering edit mode

Ooops. Yes, you will need the paired option, I'll edit the post.

ADD REPLY
0
Entering edit mode

great, thanks so much for taking the time to answer at this time of year!

ADD REPLY
0
Entering edit mode

Just to update after lots of testing - this is the method I settled on, although adding the UMIs to the already-aligned BAMs and then using umi_tools dedup also works fine.

I don't recommend the method mentioned in my original post, using fgbio.

ADD REPLY
0
Entering edit mode

Not to be pedantic but "this" meaning the method/answer suggested by @i.sudbery above? If so I can move that comment to an answer, which you can then accept to provide closure to this thread.

ADD REPLY
0
Entering edit mode

yes that's correct, the @i.sudbery answer. The general pointer to umi_tools is also useful, but Ian's answer is very specific.

ADD REPLY
0
Entering edit mode

You are able to accept more than one answer. Ian's comment has been moved to an answer now.

ADD REPLY
1
Entering edit mode
4.3 years ago

Have you looked at umi-tools?

https://github.com/CGATOxford/UMI-tools

ADD COMMENT
0
Entering edit mode
4.3 years ago
padwalmk ▴ 140

Hi, Check out the number of the read in UMI, If it's less than 1 or 2 % of total reads then you do not have to worry et al. But if it's more than 10 % than you have to do something about it.

ADD COMMENT

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6