Question

Merging a low quality run of a sample with a high quality run (RNAseq)

0

Entering edit mode

2.0 years ago

DeNovo • 0

Is there a way to merge low quality run of a sample with a high quality run?

I was thinking of doing it on the fastq's. Merging the low read fastq's with the resequenced fastq's. But I'm not sure if the headers of the fastq will affect any software (they would have diff flowcell, sample number, and cluster location). Or if there is a way to merge the data at a different point.

RNAseq expression rna gene fastq raw • 691 views

ADD COMMENT • link updated 2.0 years ago by GenoMax 141k • written 2.0 years ago by DeNovo • 0

0

Entering edit mode

I would like to help but I have a hard time understanding your question. You basically submitted for sequencing 2 different libraries including the same samples which one resulted in poor sequencing results and another one was successful instead? If so:

How do you determine "low quality"? If you attach some FastQC results it would be useful.
Why do you want to eventually keep the "low quality" results? I would just discard them and proceed downstream with the "high quality" ones.

ADD REPLY • link 2.0 years ago by Marco Pannone ▴ 790

0

Entering edit mode

Same library. There was a screwup in the normalization and the first run was overdiluted. So we just renormalized, resequenced and achieved better results. But some samples even after resequencing are low in reads.

When I said low quality I mean the fastq had a small number of reads. We resequenced it and achieved better results.
This is an assumption that I'm not sure of. But I'm assuming that I may be able to get more reads in cases where the resequencing was still low.

ADD REPLY • link 2.0 years ago by DeNovo • 0

1

Entering edit mode

But some samples even after resequencing are low in reads.

So it is specific samples that are low in read numbers (that may still be of fine quality) not the overall data as you made it sound at the beginning. Sequencing the same libraries/pool again is a "technical" replicate. With Illumina sequencing there is no discernible difference in "technical" replicates (one of the reasons why people don't sequence them as such).

You may be fine simply merging the two sets of files together but in order to save time (in case there are any other oddities) you can follow the two step approach advocated by others.

ADD REPLY • link 2.0 years ago by GenoMax 141k

0

Entering edit mode

Okay now it's clearer, your definition of "low quality" was just regarding a too low number of reads in the first run. If so, my approach would be to first try processing data from the first and second run separately and perform some data exploration such as PCA, as Soheil mentioned, in order to have an overview of the clustering of data from both runs. If the clustering is good enough, you can merge the .bam files (after alignment to the reference genome) for the same sample and proceed with all the other steps of downstream analysis. As also again Soheil recommended, remember to account for batch effect in DE analysis.

ADD REPLY • link 2.0 years ago by Marco Pannone ▴ 790

score 0 · Answer 1 · 2022-04-11

I don't think merging the fastq files is a good idea. I would treat them separately, though I suggest using proper QC and trimming on your fastq files. Then my suggestion is to look at the PCA plot; if your low-quality samples are clustering well (with or without batch correction) with samples with the same phenotype I would still be using the samples. But make sure to account for the batch effect when performing differential expression analysis.