Question

Can these RNAseq runs be combined?

0

Entering edit mode

10 months ago

mdrnao • 0

I've stepped into a little bit of a mess and I'm not sure how to fix it.

An RNA sequencing run was performed on 19 samples which yielded approx 20m reads per sample, it showed promising results but lacked statistical significance so I suggested re-sequencing the library to a better depth.

The samples were indeed rerun with the following issues:

The samples were re-pooled without two of the samples (which were questionable) so now there are 17 samples
The length of the reads changed from 36 bp to 75 bp
The sequencing was sent to another location and not performed on the same machine
The number of reads per sample was approximately the same

I was hoping that either a) it would be an identical run and I could combine the reads or b) the library would be sequenced to a greater depth so it would supersede the previous run. My main concern is that these sets of fastq files are too different to comfortably combine as I thought in (a).

How might I proceed?

Thanks in advance!

transcriptomics replicates RNAseq • 906 views

ADD COMMENT • link updated 10 months ago by GenoMax 141k • written 10 months ago by mdrnao • 0

2

Entering edit mode

Resequencing is generally a neglectable source of batch effect. Trim the new ones down to 36bp to compensate for read length prior to alignment. Then do PCA, coloring by resequencing or not, then see whether there is any evidence for batch effect. If not then just combine.

ADD REPLY • link 10 months ago by ATpoint 82k

1

Entering edit mode

The sequencing was sent to another location and not performed on the same machine

While Illumina says that the sequence should be identical no matter which sequencer/chemistry, you may want to include that as a variable in your PCA and see if there is a difference (especially if a different chemistry was used).

ADD REPLY • link 10 months ago by GenoMax 141k

0

Entering edit mode

Thanks for the reply.

I've pseudo-addressed your comment from the count tables which were processed independently albeit without trimming as you have suggested. This is what set my initial alarm bells off about the validity of combining these runs. I'll have a go trimming the reads down and see if it changes the picture.

PCA plot (dim1 vs dim2 of independently processed RNAseq runs

ADD REPLY • link 10 months ago by mdrnao • 0

0

Entering edit mode

Your mappings are going to be better at 76 bp. With 36 bp reads you may end up with more multimappers. Can you drop the short reads if you have enough data from long ones. I am assuming the libraries are amenable to longer reads i.e. there are no short inserts etc. and you are not losing data to adapter read-through.

ADD REPLY • link 10 months ago by GenoMax 141k

0

Entering edit mode

Indeed, I'm getting the reads re-aligned tonight and will find out in the morning what the PCA looks like! In the meantime, here's the alignment output of one of the samples for flavour.

Thanks for your input!

enter image description here

ADD REPLY • link 10 months ago by mdrnao • 0

0

Entering edit mode

Are these bowtie2 stats? You ought to be using a splice-aware aligner for RNAseq data (like STAR, bbmap, unless you are working with bacteria). While these numbers look fine and you do not appear to have a lot of multimappers.

ADD REPLY • link 10 months ago by GenoMax 141k

0

Entering edit mode

This looks like hisat2 output to be, although it's been a very long time since I've mapped something myself! I have to confess to being spoilt and have people who do the primary alignments and processing for me.

I can also see a STAR folder, which puts run 2 at 9.36% multi-mappers and run 1 at 13.89%.

ADD REPLY • link 10 months ago by mdrnao • 0

0

Entering edit mode

HISAT2 is great. Developed by same group that does bowtie. Longer reads make for lower multi-mappers in STAR so that makes sense.

ADD REPLY • link 10 months ago by GenoMax 141k