Different comparisons in RNASeq Analysis
2
2
Entering edit mode
5 days ago
SomeOne ▴ 200

Hi, I am in bit of a confusion regarding different comparisons in RNASeq data analysis.

Here is what i have and already done.

Dataset 1: I have a mutant-strain and its wildtype. these were used to infact plant. Then RNA was extracted and sequenced with Illumina Paired-end 150bp sequencing with 3-replicates each

Analysis 1: DEGs analysis of plant-infection-mutants(A) vs plant-infection-wildtype(B) (A-vs-B)

  1. Quality check of fastq files with FastQC
  2. Quality trimming with FastP ( I did not perform any Quality Trimming as Data already seemed quality trimmed)
  3. Read Split to get reference specific reads using bbsplit from bbmap package (with this i got plant specific reads and mutant-strain specific reads)
  4. Reads alignment using Hisat2 version 2.2.1 to ref-genome using paired-end reads
  5. Quantification using featureCounts with paired-end flags
  6. DEG analysis using DESeq2

Dataset 2: Again, a mutant-strain and its wildtype. these were grown in flask. Then RNA was extracted and sequenced with Illumina Single-end 75bp sequencing with 4-replicates each. (this is an old data)

Analysis 2: DEGs analysis of flask-grown-mutants(C) vs flask-grown-wildtype(D) (C-vs-D)

  1. Quality check of fastq files with FastQC
  2. Quality trimming with FastP ( I did not perform any Quality Trimming as Data already seemed quality trimmed)
  3. Reads alignment using Hisat2 version 2.2.1 to ref-genome using paired-end reads
  4. Quantification using featureCounts with paired-end flags
  5. DEG analysis using DESeq2

Now their are two more analysis which i want to do.

Analysis 3: DEGs analysis of plant-infection-mutants(A) vs flask-grown-mutants(C) (A-vs-C)

For Analysis-3 i did try this approach so far,

  1. extracted the plant-infection-mutants(A) featurecounts data from the Analysis-1 featurecount matrix which is based on illumina paired-end 150bp sequencing (first 6 columns GeneID Chr Start End Strand Length + 3 columns which contain mutant expression values in plants)
  2. extracted the flask-grown-mutants(C) feature counts data from the Analysis-2 featurecount matrix which is based on illumina single-end 75bp sequencing (first 6 columns GeneID Chr Start End Strand Length + 4 columns which contain mutant expression values in flask)
    1. Merged both based on the GeneIDs, (Note: A has 3 replicates, C has 4 replicates)
    2. DEG analysis using DESeq2 using the same commands as in A-vs-B and C-vs-D

Analysis 4: DEGs analysis of plant-infection-wildtype(B) vs flask-grown-wildtype(D) (C-vs-D)

For this i went with similar approach to Analysis-3

Questions

  1. For analysis-1 and analysis-2 is my approach correct ? As far as i know, DESeq2 itself performs Median Ratio Normalization (MRN) so i didnot perfrom any other normalization.
  2. I am confused about the analysis-3 and analysis-4. are they correct? or the Different sequence-type (paired vs single), difference in read-length (150bp-x2 vs 75bp-x1) will have any technical or batch effect ?
  3. if analysis 3 and 4 are not correct, what should i do ? do i need to normalize them? by what method ?
  4. Any other thoughts or points you have to raise.

Your thoughts and suggestions will be really helpful.

Regards

DEGs RNAseq • 699 views
ADD COMMENT
0
Entering edit mode

For analyses 3 and 4, you will need to account for batch effects, which I expect to be substantial given the differences between datasets—e.g., read length, read type (paired-end vs. single-end), sequencing machines, and sample preparation by different technicians...etc.

ADD REPLY
0
Entering edit mode

Hi! Thank you for your response. Can you share some information on how to remove this batch effect ?

ADD REPLY
0
Entering edit mode

See the post.

ADD REPLY
0
Entering edit mode

You cannot remove batch effects where they overlap perfectly with a biological condition of interest.

ADD REPLY
0
Entering edit mode
5 days ago

or the Different sequence-type (paired vs single), difference in read-length (150bp-x2 vs 75bp-x1) will have any technical or batch effect ?

Yes, they absolutely will. This is fixable by trimming the paired one to be 75 bases, and just using the R1 fastq for aligning. Different instruments should not introduce too many artifacts. But the more substantial problem is that they were prepped on different dates. This will introduce a pretty big batch effect. I'm not sure analyses 3 and 4 are worth doing, because of the batch effect.

ADD COMMENT
0
Entering edit mode

Hi. Thank you for replying.

If i trim the PE data to SE and at 75bp length, and perform the analysis.

Wouldn’t that effect the final comparison where we want to see the situation in A-vs-B C-vs-D A-vs-C B-vs-D all together in one table. ?

ADD REPLY
0
Entering edit mode

You will get slightly different alignment with longer paired reads. You don't want that if you want to compare the two sets of reads to each other. It won't necessarily make a huge difference. The artifactual batch effects are far more concerning.

ADD REPLY
0
Entering edit mode
5 days ago
mbyvcm ▴ 460

For analysis 3, you could think about something like:

(Plant Mut vs. Plant WT) vs (Flask Mut vs. Flask WT)

In doing so you are internally controling each MUT using the corresponding WT.

ADD COMMENT
0
Entering edit mode

That's not really the comparison OP was asking about, but I don't think they can meaningfully do the comparisons they wanted.

ADD REPLY
0
Entering edit mode

The OP wanted to directly compare the Mutants (Analysis 3), and the analysis I have proposed for their consideration will do that, whilst controlling for some of the batch effect. To what extent this is meaningful I suppose is up to the OP to decide.

ADD REPLY
0
Entering edit mode

HI, mbyvcm Thank you for your reply.

Actually i have already done the analysis you suggest,

(Plant Mut vs. Plant WT) is same as

Analysis 1: DEGs analysis of plant-infection-mutants(A) vs plant-infection-wildtype(B) (A-vs-B)

(Flask Mut vs. Flask WT) is same as

Analysis 2: DEGs analysis of flask-grown-mutants(C) vs flask-grown-wildtype(D) (C-vs-D)

or you meant something differently ? can you elaborate a bit.

ADD REPLY
0
Entering edit mode

My suggestion is that you consider performing what is commonly referred to as a "delta-delta" contrast. You should be able to find plenty of examples / tutorials (limma manual for example)

ADD REPLY

Login before adding your answer.

Traffic: 3819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6