Differentially gene expression multispecies
0
0
Entering edit mode
19 months ago

Hi friends, how are you? I need your help with my study project. I have RNAseq data for 3 species (6 replicates per species), species form a genus within bedbugs. I would like to analyze the differential expression of these species, however I have doubts if the answers I get are real or methodological biases. At the moment I am using all assemblies (18 = 6 per species) to map the expression and I have interesting results, however I don't know if it could be technical bias.

On sequencing they were sequenced in the same batch and the conditions were the same. My doubt is that because I don't have a genome or transcriptome as a reference, I may be obtaining "non-real" data about the expression. I chose to use all assemblies (n=18 (6 per species)) to obtain the reference and analyze against this "super reference". My collaborators are unsure about the results, however the methodology is consistent with "good practices".

I would like to be certain that I could continue with this study.

#build reference with all assemblies (n =18)

kallisto index -i reference ( all assemblies)

#analysed by sample (pairend)

kallisto quant -i reference.idx -o output --rf-stranded -b 100 r1.fasta r2.fasta

## estimetes
Trinity/util/abundance_estimates_to_matrix.pl \
--est_method kallisto --gene_trans_map reference.fasta.gene_trans_map \
--name_sample_by_basedir --cross_sample_norm TMM --out_prefix outdir \
sample1, sample2 ...sample18

Trinity/Analysis/DifferentialExpression/run_DE_analysis.pl --matrix  gene.counts.matrix --method edgeR --output out --dispersion 0.1

Trinity/Analysis/DifferentialExpression/analyze_diff_expr.pl --matrix gene.TMM.EXPR.matrix --max_genes_clust 1000000 -P 1e-3 -C 4

Trinity/Analysis/DifferentialExpression/define_clusters_by_cutting_tree.pl -R / diffExpr.P1e-3_C4.matrix.RData --Ptree 60!


results

Expression Multispecies Rnaseq differentially • 801 views
2
Entering edit mode

I'm not sure to understand what you are trying to do. To make a differential gene expression analysis, one needs to compare the expression of genes between conditions. The issue is that here, since you study different species – that presumably do not have the same set of genes – it does not make sense to compare gene expression.

0
Entering edit mode

In this case I compare different species under the same conditions, (development stage, environment, etc.).

0
Entering edit mode

yes, but since they do not have the same genes... you are basically trying to compare apple and banana.

2
Entering edit mode

Inter-species is really not my field but intuitively I would have the following ideas / see these obstacles:

• you would need a reference transcriptome from both species
• you would need to identify the exact orthologs between the species for this analysis to make sense
• you would need to identify mappability and GC content for every gene in each species to be sure that no obvious technical / sequence content differences drive differential expression. As genes are larges and reads are short one would have multiple values per gene but might require some kind of single value for these factors to use it as covariates. No idea how this would go.
• the GC and mappability would need to go into the DE model as covariates

There are probably additional factors to consider here. Be sure to read literature on this and about specific tools to do this. Others have probably done it before, maybe there are dedicated approaches for it. Be sure to not reinvent the wheel. I doubt that a wrapper script such as the one in Trinity will do here for the reasons mentioned above.

0
Entering edit mode

Very good, as we don't have a reference to date, it would be wrong for me to just determine the orthologs and evaluate the expression from the orthologs. An important point are very closely related species and form a genus of only 3 species that have probably evolved recently through hybridization.

0
Entering edit mode

Hi, I basically want to do a similar thing - comparing gene expression levels between different species and finding significance.

Generally, I am interested to hear how did it go with you since I am a beginner in bioinformatics and still in the steap learning process so at the moment I am trying to find as much information relevant to this topic as possible :)