Question: is it ok to merge biological replicates for novel isoform discovery?
gravatar for rajeshkumar_vinod
2.1 years ago by
rajeshkumar_vinod30 wrote:

Hello everyone, i am using some dataset to identify novel transcript during infected condition. I have data as shown in following image (replicates are biological replicates taken from pool of few plants for control and infected separately)

enter image description here

now to analyze using tophat
Procedure 1: one way is to map both replicate of control together and likewise infected together using tophat and get 2 bam files then use these bam files for cufflinks input, get 2 gtf file, 1 for control & 1 for infected, merge them using cuffmerge, use it as for expression quantification and differential expression analysis of 2 bam files.
Procedure 2: other one is map each biological replicate separately using tophat get 4 bam files, use these 4 bam files as input for cufflink, get 4 gtf files, merge them with cuffmerge, use it as for expression quantification of 4 bam files separately and differential expression analysis.

is there any other way to analyze this data and which one of these two procedure should i choose? Problem with procedure 1, I think is during differential expression analysis i will not have replicate for differential expression analysis.

Thank you.

rna-seq • 1.0k views
ADD COMMENTlink modified 2.1 years ago by EagleEye6.2k • written 2.1 years ago by rajeshkumar_vinod30

You should ask your supervisor. However it seems you already answered your own question since the selected method pertains to the downstream analysis.

ADD REPLYlink written 2.1 years ago by theobroma221.1k

You should ask your supervisor.

OP didn't come to biostars to hear that. In addition, I moved your post to a comment since you are not adding additional information.

ADD REPLYlink written 2.1 years ago by WouterDeCoster37k
gravatar for WouterDeCoster
2.1 years ago by
WouterDeCoster37k wrote:

As you suggest yourself your procedure 1 is not optimal since you'll lose the important replicates for differential expression analysis. However, 2 replicates per condition is rather low. I don't know if you were involved in the design of this study, but the very minimum is most commonly 3 replicates, but to get better results it's important to add a few more (if budget permits that).

That being said, you should know that Tophat is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.)

And now for your actual question.
I'm not entirely sure if this is applicable to HISAT2 (but I would expect there is a similar option): you could try a two-step alignment. You map all samples separately and based on these results together you modify/create a gtf file to capture all identified splicing events/isoforms. You use your new gtf file for mapping a second time (as such incorporating all knowledge of isoforms from all samples). You can delete the alignments you generated the first time.

I use this approach using the STAR aligner, which is excellent and accurate but requires quite some RAM. The manual is very clear and the part most interesting for you would be chapter 8: 2-pass mapping.

Good luck!

ADD COMMENTlink written 2.1 years ago by WouterDeCoster37k
gravatar for cdsouthan
2.1 years ago by
cdsouthan1.8k wrote:

It seems to me choices of replicates are obviously related to the design of the experiments and the variability you can detect. You should start with technical replicates (multiple extractions and multiple, successive, nominally identical machine runs) on a control single plant and see how tight (reproducible) the transcript reads and alternative splice detections are. Logically you should then move on to biological replicates as different plants or small pools of plants (but still controls). Having then an appreciation of both sources of variation, you can move on to infected, but I guess you have an extra variable from differences in the infection procedure each time.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by cdsouthan1.8k
gravatar for EagleEye
2.1 years ago by
EagleEye6.2k wrote:

Check the supplementary methods section of this recent article. My advice is to follow some recent article.

Orelse you can also follow the procedure I wrote on previous post.

In both methods you combine the biological replicate to avoid technical complexity.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by EagleEye6.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1262 users visited in the last hour