Question

Transcript quantification

0

Entering edit mode

10 months ago

firefox91 • 0

Hello,

I would like to quantify the expression of a specific transcript (24nt) in a lot of rnaseq files (transcriptomes) frome the SRA database. Which tools are the easiest to use for this ? (I can't install Salmon)

Thanks !

rna-seq transcriptome SRA • 1.2k views

ADD COMMENT • link updated 10 months ago by ATpoint 82k • written 10 months ago by firefox91 • 0

2

Entering edit mode

You would not be able to search against all of SRA but if you have specific datasets you want to look into then you can use the SRA blast available via NCBI web blast.

I used a 24 bp example for searching and this what what you will see (the link will only stay valid for couple of days): https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=8SV7GGWE016

ADD REPLY • link 10 months ago by GenoMax 141k

0

Entering edit mode

I forgot to precise that the transcriptomes are not annotated.

ADD REPLY • link 10 months ago by firefox91 • 0

0

Entering edit mode

I think we are missing information. Is the transcript 24nt or are you looking for a motif that is 24nt? Is this polyA RNA-seq or some sort of small RNA-seq? Do you have a reference genome or transcriptome?

ADD REPLY • link 10 months ago by biofalconch ★ 1.1k

0

Entering edit mode

The transcript I am looking for is 24nt and I am searching it in human embryo transcriptomes.

I am going to collect a lot of transcriptomes in the SRA database so the RNA-seq method changes depending on the experience.

I have a reference genome and transcriptome for homo sapiens.

ADD REPLY • link 10 months ago by firefox91 • 0

1

Entering edit mode

I think the issue you are going to have is that most RNASeq protocols treat 24-mers as junk, and filter such small sequences away. You are going to have to research the library prep protocols to see which ones would properly preserve your target transcript.

ADD REPLY • link 10 months ago by swbarnes2 14k

1

Entering edit mode

There are almost no reliable annotated transcripts in human/GENCODE (see bottom) that are that short. smallRNAs such as miRNAs are posttranscriptionally processed and trimmed to that size, but this you will not find in standard RNA-seq as others have mentioned. My recommendation is to answer the essentials first:

how did you come to the idea that this is a real transcript and why do you think this really exists. This is probably the core of your research, and confidentiality might forbid to tell it here, but it's the most important question, as 24nt, unless a dedicated smallRNA, seems almost certainly like an artifact.
is it even polyadenylated, so you would see it in typical (small) RNA-seq? Or do you need ribodepletion protocols?
Have you aligned this sequence to the genome to see whether it even exists as a DNA template in the human genome?
Is 24nt the actual transcibed size or is the transcript longer and there are posttranscriptional modifications that make it shorter?

This all you should find out, with strangers in the internet, or much better with a local experience person who knows RNA-seq in and out. And then based on this you can decide whether there is a realistic chance to answer your question, or whether you're chasing ghosts.

library(rtracklayer)
library(tidyverse)

url <- "https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz"
gtf <- rtracklayer::import(url)

gtf %>%
  data.frame %>%
  filter(type=="transcript") %>% 
  mutate(length=abs(end-start)+1) %>%
  filter(length<=24) %>% 
  pull(gene_type) %>%
  table

IG_D_gene processed_pseudogene       protein_coding            TR_D_gene 
                  24                    1                    1                    4

ADD REPLY • link 10 months ago by ATpoint 82k