How to get transcriptome reference?
2
0
Entering edit mode
12 weeks ago

I have the rna sequence from nanopore. So I want to alignment my rna sequence with transcriptome reference.

But I don't know how to get transcriptome reference from anywhere such as ncbi or any database. Therefore my question is :

1. In NCBI, If we have the complete genome if I download in format of coding sequence. We can use this coding sequence as a transcriptome reference?

2. If not how I get the transcriptome reference?

Thank you.

reference mRNA RNA Virus modification • 811 views
1
Entering edit mode

For question 1, yes, the protein-coding sequences should be fine to use as a transcriptome reference. You just won't be able to map against non-coding RNA.

1
Entering edit mode

You tagged this "Virus". Whether you need to download a transcriptome at all depends on your method of alignment or pseudo-alignment. Two things:

• viral genomes are small, so you won't have a lot challenges with indexing or aligning to them
• there is not much splicing going on if any nor are there many intergenic regions, so the transcriptome will be more or less identical to the genome (of course if you want to use salmon or kalisto you need the transcriptome for pseudo alignment)

In principle, you should simply use the genome sequence to align against using e.g. BWA-mem which is quite straightforward. Using a splicing-aware aligner will work but isn't required for viral sequences (just check the genome annotation if in doubt).

0
Entering edit mode

Thank you everyone. I get more understand it and I will follow with your guidance.

1
Entering edit mode
12 weeks ago
barslmn ★ 1.4k

There is a manual about it here: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

1
Entering edit mode
12 weeks ago
dsull ★ 4.0k

There are plenty of places where you can get transcriptome references. E.g. https://www.gencodegenes.org/human/ -- has FASTA files designated "Transcript sequences".

Otherwise, you can use a tool to extract cDNA regions from a genome FASTA based on annotations from GTF. For example, in the kb-python package (for kallisto and bustools), you can supply the the kb ref command with a FASTA and GTF, and it will output a transcriptome FASTA. There are other tools out there with similar functionality.