Question

Different comparisons in RNASeq Analysis

2

Entering edit mode

5 days ago

SomeOne ▴ 200

Hi, I am in bit of a confusion regarding different comparisons in RNASeq data analysis.

Here is what i have and already done.

Dataset 1: I have a mutant-strain and its wildtype. these were used to infact plant. Then RNA was extracted and sequenced with Illumina Paired-end 150bp sequencing with 3-replicates each

Analysis 1: DEGs analysis of plant-infection-mutants(A) vs plant-infection-wildtype(B) (A-vs-B)

Quality check of fastq files with FastQC
Quality trimming with FastP ( I did not perform any Quality Trimming as Data already seemed quality trimmed)
Read Split to get reference specific reads using bbsplit from bbmap package (with this i got plant specific reads and mutant-strain specific reads)
Reads alignment using Hisat2 version 2.2.1 to ref-genome using paired-end reads
Quantification using featureCounts with paired-end flags
DEG analysis using DESeq2

Dataset 2: Again, a mutant-strain and its wildtype. these were grown in flask. Then RNA was extracted and sequenced with Illumina Single-end 75bp sequencing with 4-replicates each. (this is an old data)

Analysis 2: DEGs analysis of flask-grown-mutants(C) vs flask-grown-wildtype(D) (C-vs-D)

Quality check of fastq files with FastQC
Quality trimming with FastP ( I did not perform any Quality Trimming as Data already seemed quality trimmed)
Reads alignment using Hisat2 version 2.2.1 to ref-genome using paired-end reads
Quantification using featureCounts with paired-end flags
DEG analysis using DESeq2

Now their are two more analysis which i want to do.

Analysis 3: DEGs analysis of plant-infection-mutants(A) vs flask-grown-mutants(C) (A-vs-C)

For Analysis-3 i did try this approach so far,

extracted the plant-infection-mutants(A) featurecounts data from the Analysis-1 featurecount matrix which is based on illumina paired-end 150bp sequencing (first 6 columns GeneID Chr Start End Strand Length + 3 columns which contain mutant expression values in plants)
extracted the flask-grown-mutants(C) feature counts data from the Analysis-2 featurecount matrix which is based on illumina single-end 75bp sequencing (first 6 columns GeneID Chr Start End Strand Length + 4 columns which contain mutant expression values in flask)
1. Merged both based on the GeneIDs, (Note: A has 3 replicates, C has 4 replicates)
2. DEG analysis using DESeq2 using the same commands as in A-vs-B and C-vs-D

Analysis 4: DEGs analysis of plant-infection-wildtype(B) vs flask-grown-wildtype(D) (C-vs-D)

For this i went with similar approach to Analysis-3

Questions

For analysis-1 and analysis-2 is my approach correct ? As far as i know, DESeq2 itself performs Median Ratio Normalization (MRN) so i didnot perfrom any other normalization.
I am confused about the analysis-3 and analysis-4. are they correct? or the Different sequence-type (paired vs single), difference in read-length (150bp-x2 vs 75bp-x1) will have any technical or batch effect ?
if analysis 3 and 4 are not correct, what should i do ? do i need to normalize them? by what method ?
Any other thoughts or points you have to raise.

Your thoughts and suggestions will be really helpful.

Regards

DEGs RNAseq • 699 views

ADD COMMENT • link updated 19 hours ago by mbyvcm ▴ 460 • written 5 days ago by SomeOne ▴ 200

0

Entering edit mode

For analyses 3 and 4, you will need to account for batch effects, which I expect to be substantial given the differences between datasets—e.g., read length, read type (paired-end vs. single-end), sequencing machines, and sample preparation by different technicians...etc.

ADD REPLY • link 5 days ago by jkim ▴ 190

0

Entering edit mode

Hi! Thank you for your response. Can you share some information on how to remove this batch effect ?

ADD REPLY • link 5 days ago by SomeOne ▴ 200

0

Entering edit mode

See the post.

ADD REPLY • link 5 days ago by jkim ▴ 190

0

Entering edit mode

You cannot remove batch effects where they overlap perfectly with a biological condition of interest.

ADD REPLY • link 5 days ago by swbarnes2 15k

score 0 · Answer 1 · 2025-06-13

0

Entering edit mode

5 days ago

swbarnes2 15k

or the Different sequence-type (paired vs single), difference in read-length (150bp-x2 vs 75bp-x1) will have any technical or batch effect ?

Yes, they absolutely will. This is fixable by trimming the paired one to be 75 bases, and just using the R1 fastq for aligning. Different instruments should not introduce too many artifacts. But the more substantial problem is that they were prepped on different dates. This will introduce a pretty big batch effect. I'm not sure analyses 3 and 4 are worth doing, because of the batch effect.

ADD COMMENT • link 5 days ago by swbarnes2 15k

0

Entering edit mode

Hi. Thank you for replying.

If i trim the PE data to SE and at 75bp length, and perform the analysis.

Wouldn’t that effect the final comparison where we want to see the situation in A-vs-B C-vs-D A-vs-C B-vs-D all together in one table. ?

ADD REPLY • link 5 days ago by SomeOne ▴ 200

0

Entering edit mode

You will get slightly different alignment with longer paired reads. You don't want that if you want to compare the two sets of reads to each other. It won't necessarily make a huge difference. The artifactual batch effects are far more concerning.

ADD REPLY • link 5 days ago by swbarnes2 15k

score 0 · Answer 2 · 2025-06-13

0

Entering edit mode

5 days ago

mbyvcm ▴ 460

For analysis 3, you could think about something like:

(Plant Mut vs. Plant WT) vs (Flask Mut vs. Flask WT)

In doing so you are internally controling each MUT using the corresponding WT.

ADD COMMENT • link 5 days ago by mbyvcm ▴ 460

0

Entering edit mode

That's not really the comparison OP was asking about, but I don't think they can meaningfully do the comparisons they wanted.

ADD REPLY • link 5 days ago by swbarnes2 15k

0

Entering edit mode

The OP wanted to directly compare the Mutants (Analysis 3), and the analysis I have proposed for their consideration will do that, whilst controlling for some of the batch effect. To what extent this is meaningful I suppose is up to the OP to decide.

ADD REPLY • link 4 days ago by mbyvcm ▴ 460

0

Entering edit mode

HI, mbyvcm Thank you for your reply.

Actually i have already done the analysis you suggest,

(Plant Mut vs. Plant WT) is same as

Analysis 1: DEGs analysis of plant-infection-mutants(A) vs plant-infection-wildtype(B) (A-vs-B)

(Flask Mut vs. Flask WT) is same as

Analysis 2: DEGs analysis of flask-grown-mutants(C) vs flask-grown-wildtype(D) (C-vs-D)

or you meant something differently ? can you elaborate a bit.

ADD REPLY • link 2 days ago by SomeOne ▴ 200

0

Entering edit mode

My suggestion is that you consider performing what is commonly referred to as a "delta-delta" contrast. You should be able to find plenty of examples / tutorials (limma manual for example)

ADD REPLY • link 19 hours ago by mbyvcm ▴ 460