Question

Expression of transposons in transcriptomes

1

Entering edit mode

4.8 years ago

SeaStar ▴ 50

Hello! I have a question for you. I have a fasta file of transposons, with name and sequences. I would to quantify the expression of transposons in some different transcriptomes. What kind of analysis do you suggest to me? What software could I use? Thanks a lot

sequence alignment • 1.6k views

ADD COMMENT • link updated 4.8 years ago by A. Domingues ★ 2.7k • written 4.8 years ago by SeaStar ▴ 50

score 1 · Answer 1 · 2019-07-04

1

Entering edit mode

4.8 years ago

A. Domingues ★ 2.7k

I have used SalmonTE in the past and had good experiences. It uses salmon the background but then aggregates the counts per element, family and class. The results tables are also ready to use with DESeq2

ADD COMMENT • link 4.8 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Hi Dominigues! Thank you a lot!

ADD REPLY • link 4.8 years ago by SeaStar ▴ 50

0

Entering edit mode

Is it possible to use your own reference index from a fasta file with transposable elements generated by repeatscout instead of the ones present in the database of salmonTE?

ADD REPLY • link 4.8 years ago by SeaStar ▴ 50

1

Entering edit mode

No idea. I suggest asking the developers in github. They have been quite responsive whenever I had similar questions.

ADD REPLY • link 4.8 years ago by A. Domingues ★ 2.7k

score 0 · Answer 2 · 2019-07-04

~~How long are these sequences on average~~ (ok 500-1000bp), and are they polyadenylated? There are two things to consider:

First, if they are not polyA they will be missed in most RNA-seq samples as most are polyA-enriched. Second, they must be at least in the range of 200bp or longer as shorter sequences typically get exluded in the library preparation except it is shortRNA sequencing. Transposons are not my field so be sure that it is common to detect them in RNA-seq as there are some RNA species that are rapidly degraded and might require special library prep techniques to preserve them, which might not be the case in most standard RNA-seq samples.

From the technical side, check first if these sequences are already present in the respective reference transcriptome. If so, use a tool such as salmon to quantify your data against it. If not include the sequences (without polyA tails) into that reference and then use salmon. Alternatively align data against a reference genome with tools such as star or hisat2 and then make sure you have a annotation file (GTF) where you included the coordinates of these sequences. Tools such as featureCounts can then assign the aligned reads to the features in the GTF. This is all pretty much standard so please first get a background in RNA-seq and the related analysis techniques.