Question: Human transcriptome download
0
gravatar for KVC_bioinfo
2.9 years ago by
KVC_bioinfo450
Boston
KVC_bioinfo450 wrote:

Hello,

I am going to align the RNA -seq data to human transcriptome. However, I am not sure which database I should use. NCBI's RefSeq or RefSeqGene or anything else?

Can anyone help me with that? Thank you in advance

rna-seq #transcriptome • 3.6k views
ADD COMMENTlink modified 2.8 years ago by Kevin Blighe63k • written 2.9 years ago by KVC_bioinfo450
1

You should use the whole genome for alignment and then use a GFF file to do your counting.

ADD REPLYlink written 2.9 years ago by genomax87k

We plan to use Human transcriptome. I will also need the gtf file for that. from where I can get that?

ADD REPLYlink written 2.9 years ago by KVC_bioinfo450
1

While you could get that data from multiple places, Illumina has bundles that contains matched sequence, annotation and index files for bowtie2/bwa hosted at iGenomes site for many genomes, including human.

ADD REPLYlink written 2.9 years ago by genomax87k

Yes. But it is human genomes. I am looking to download Human transcriptome.

ADD REPLYlink written 2.9 years ago by KVC_bioinfo450
1

If you are referring to a set of transcript sequences (minus the introns/non-coding regions) then Ensembl Human Genome page is as a good place as any. Look under "gene annotation" on right side.

ADD REPLYlink written 2.9 years ago by genomax87k

II found this

And from that page, i downloaded RefSeq Transcripts. Is it correct? Also, I need annotation file when aligning with STAR. Could you please tell me how do I get that? thank you very much.

ADD REPLYlink written 2.9 years ago by KVC_bioinfo450

Make this easy on yourself. Follow the directions to get pre-made indexes for STAR: C: Pre made STAR Index?

ADD REPLYlink written 2.9 years ago by genomax87k

I looked into it. it does not have a pre made an index for human transcriptome

ADD REPLYlink written 2.9 years ago by KVC_bioinfo450

Have you looked at STAR manual? It may be good to spend some time and go through it.

ADD REPLYlink written 2.9 years ago by genomax87k
2
gravatar for Tom_L
2.9 years ago by
Tom_L320
Tom_L320 wrote:

I recommend you to use the information available in the Table Browser from UCSC. Pick your genome version (hg19 or hg38), choose your annotations (Ensembl, RefSeq, etc.) and get the GTF output format. RefSeq is a good starting point. If you need a transcriptome fasta file, you can use the gtf_to_fasta tool available in the TopHat2 package. You will be able to use this file with many aligners (not restraint to TopHat2).

ADD COMMENTlink written 2.9 years ago by Tom_L320
2
gravatar for Kevin Blighe
2.8 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Just use GENCODE's reference transcriptome FASTA:

https://www.gencodegenes.org/releases/current.html

[Direct link to gzipped FASTA: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.transcripts.fa.gz]

I have done this for hundreds of RNA-seq samples

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 746 users visited in the last hour