Question: RNAseq analysis with very close species
2
gravatar for corend
3 months ago by
corend70
corend70 wrote:

I am currently working on a RNAseq data set from 2 conditions in 1 species. For this species the genome is nice (I have full chromosomes) and well annotated. I used the New tuxedo pipeline to analyse my results. This worked very well.

I now retrieved some RNAseq data from a very close species with the same 2 conditions. I would like to perform the same analysis on this species and find some up-regulated genes in the first species that are down-regulated in the other species for example. But in this species, the genome is quite bad compared to the first one (400 000 scaffolds).

I though about different ways of doing it:

  1. Do the New tuxedo pipeline again using the "bad" genome and then figuring out which gene corresponds to which gene in the first species.

  2. Using directly the genome of the first species for the second species (Ok it is different but at least it is good).

  3. Using the transcripts I assembled in the first species to quantify their expression with the reads of the second species (i.e. not using the new tuxedo).

I don't know which solution would be better, if you have any ideas, thanks!

EDIT:

The overall alignement rate for the "good" genome RNAseq vs "good" genome: 90%

Overall alignment rate for the "bad" genome RNAseq vs the "good" genome : 47%

Overall alignment rate for the "bad" genome RNAseq vs the "bad" genome : 92%

rna-seq • 280 views
ADD COMMENTlink modified 12 weeks ago • written 3 months ago by corend70
1

What about performing de novo transcriptomic assembly for the specie with fragmented genome? rnaspades works fine form me.

ADD REPLYlink written 3 months ago by Buffo1.2k

I could add this idea to the previous list, but would it be better and why?

ADD REPLYlink written 3 months ago by corend70

Yes exactly as Carlo asked, option 2 depends on how close-related genomes are, probably comparing % of aligned reads? (excluding multihit), but also option 1 would works after a genome refinement (filter redundant scaffolds, low coverage, etc.).

ADD REPLYlink written 3 months ago by Buffo1.2k

I am aligning on the "good" genome to see if the percentage of uniquely mapped reads is ok.

ADD REPLYlink written 3 months ago by corend70
1

How close are the two species ? (% genome identity, etc)

You can always try option 3, because it is the easiest, and see how it goes. If most of your reads are unmapped, then I'm afraid that you will have to use option 1 (or de novo transcriptomic assembly).

ADD REPLYlink written 3 months ago by Carlo Yague4.3k

I edited my post, I align 47% of reads from species 2 on the genome of my first species.

ADD REPLYlink written 3 months ago by corend70
1

Ok, its not that bad, but not so good either. Now it is up to you to decide:

Is it ok to miss about 50% of information with option 3, knowing that the interpretation of the results will be simpler (you can easily compare differentially expressed genes from species 1 and 2) ? Or do you have the time and ressources to do a more complex analysis involving (A) either the tuxedo pipeline or de novo transcriptome assembly and (B) finding the homologs between species 1 and 2. The second option will obviously take more time but will probably be more accurate.

You can also do both analysis and see how they converge (hopefully) to the same conclusion.

ADD REPLYlink written 3 months ago by Carlo Yague4.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour