Question

How to get transcriptome reference?

0

Entering edit mode

15 months ago

ธาดา • 0

I have the rna sequence from nanopore. So I want to alignment my rna sequence with transcriptome reference.

But I don't know how to get transcriptome reference from anywhere such as ncbi or any database. Therefore my question is :

In NCBI, If we have the complete genome if I download in format of coding sequence. We can use this coding sequence as a transcriptome reference?
If not how I get the transcriptome reference?

Thank you.

reference mRNA RNA Virus modification • 1.4k views

ADD COMMENT • link 15 months ago by ธาดา • 0

1

Entering edit mode

For question 1, yes, the protein-coding sequences should be fine to use as a transcriptome reference. You just won't be able to map against non-coding RNA.

ADD REPLY • link 15 months ago by dsull ★ 5.8k

1

Entering edit mode

You tagged this "Virus". Whether you need to download a transcriptome at all depends on your method of alignment or pseudo-alignment. Two things:

viral genomes are small, so you won't have a lot challenges with indexing or aligning to them
there is not much splicing going on if any nor are there many intergenic regions, so the transcriptome will be more or less identical to the genome (of course if you want to use salmon or kalisto you need the transcriptome for pseudo alignment)

In principle, you should simply use the genome sequence to align against using e.g. BWA-mem which is quite straightforward. Using a splicing-aware aligner will work but isn't required for viral sequences (just check the genome annotation if in doubt).

ADD REPLY • link 15 months ago by Michael 54k

0

Entering edit mode

Thank you everyone. I get more understand it and I will follow with your guidance.

ADD REPLY • link 15 months ago by ธาดา • 0

score 1 · Answer 1 · 2022-12-29

1

Entering edit mode

15 months ago

barslmn ★ 2.1k

There is a manual about it here: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

ADD COMMENT • link 15 months ago by barslmn ★ 2.1k

score 1 · Answer 2 · 2022-12-29

There are plenty of places where you can get transcriptome references. E.g. https://www.gencodegenes.org/human/ -- has FASTA files designated "Transcript sequences".

Otherwise, you can use a tool to extract cDNA regions from a genome FASTA based on annotations from GTF. For example, in the kb-python package (for kallisto and bustools), you can supply the the kb ref command with a FASTA and GTF, and it will output a transcriptome FASTA. There are other tools out there with similar functionality.