Hello! I have a question for you. I have a fasta file of transposons, with name and sequences. I would to quantify the expression of transposons in some different transcriptomes. What kind of analysis do you suggest to me? What software could I use? Thanks a lot
I have used SalmonTE in the past and had good experiences. It uses
salmon the background but then aggregates the counts per element, family and class. The results tables are also ready to use with
How long are these sequences on average (ok 500-1000bp), and are they polyadenylated? There are two things to consider:
First, if they are not polyA they will be missed in most RNA-seq samples as most are polyA-enriched. Second, they must be at least in the range of 200bp or longer as shorter sequences typically get exluded in the library preparation except it is shortRNA sequencing. Transposons are not my field so be sure that it is common to detect them in RNA-seq as there are some RNA species that are rapidly degraded and might require special library prep techniques to preserve them, which might not be the case in most standard RNA-seq samples.
From the technical side, check first if these sequences are already present in the respective reference transcriptome. If so, use a tool such as
salmon to quantify your data against it. If not include the sequences (without polyA tails) into that reference and then use
salmon. Alternatively align data against a reference genome with tools such as
hisat2 and then make sure you have a annotation file (GTF) where you included the coordinates of these sequences. Tools such as
featureCounts can then assign the aligned reads to the features in the GTF. This is all pretty much standard so please first get a background in RNA-seq and the related analysis techniques.