Question: How to blast against published transcriptome data
gravatar for Yongjie Zhang
3.2 years ago by
UC Berkeley, USA/ Shanxi Univ, China
Yongjie Zhang80 wrote:

Dear All,

For the fungus I study, transcriptome data were submitted to NCBI by other researchers. Now I want to investigate if some genes are expressed or not, therefore, do you know how to blast against the transcriptome data?

I know that we can blast against a genome online in the NCBI website (, but if we can blast against a transcriptome in a similar way? If no, how to download the transcriptome data in order to perform local blast. This is the information I get from the literature: "The RNA-seq expression dataset is available at the NCBI’s Gene Expression Omnibus under the accession code GSE28001."

Thanks in advance for any explanations.


transcriptome blast • 2.8k views
ADD COMMENTlink modified 2.0 years ago by Biostar ♦♦ 20 • written 3.2 years ago by Yongjie Zhang80

I just checked, I think no gene expression tables are submitted by them. If you just want to keep it simple and clean, best would be to email authors and ask for the count/FPKM files.

ADD REPLYlink written 3.2 years ago by Sukhdeep Singh9.6k
gravatar for Sukhdeep Singh
3.2 years ago by
Sukhdeep Singh9.6k
Sukhdeep Singh9.6k wrote:

You don't need to perform BLAST for that. If they have not submitted the expresison tables, you need to download the raw data and convert to the gene expression or count tables.

First step would be to
How to download raw sequence data from GEO/SRA

Second would be to process the data using something like
Searching For A Simple Yet Powerful Workflow For Rna-Seq

Then, it's just a table that you can explore any gene for expression that you want.

ADD COMMENTlink written 3.2 years ago by Sukhdeep Singh9.6k

Thanks. But what I want to know is if certain genes are transcribed or not. I do not care their expression amounts.  Do you know if there is a blast-like approach to search against assembled transcripts data (although I'm not sure if there is such data)?

ADD REPLYlink written 3.2 years ago by Yongjie Zhang80

What I dont understand is why you want to use a blast like approach. I believe what has been published in the above study is expression profiles or a RNA-Seq dataset, which consists of reverse transcribed RNA short fragments. These are then mapped to the genome ("transcriptome") and then the number of fragments mapped to a locus is counted and converted to an expression value. Based on this expression value, you can say if agene is expressed or not. When you say blast, what I can think of, is you have a short read sequenced dataset that you can map (blast) to transcriptome and generate the same expression counts which will tell you a gene is expressed or not, mere blasting will just tell you the identity and %age of match.

ADD REPLYlink written 3.2 years ago by Sukhdeep Singh9.6k

I see. People generally use no. of raw reads to measure if a gene is expressed, but why do not  assemble raw reads and then blast against the assembled transcripts? Is the latter way feasible?  Thanks for any comments.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Yongjie Zhang80

The short reads which are sequenced are from millions of cells to say. So, you would be getting same information more than once which also becomes the basis for quantifying gene expression. A part of what you said is right, raw reads are assembled (mapped) to a genome and a transcriptome (which consists of exon-exon junctions) and then the read pileup tells you how much it is expressed. The way you are saying has two problems, I reckon.

  1. Its hard to differentiate between noise (sequencing artifacts etc) and real reads
  2. This will only give us if a gene is expressed or not, is the BLAST matches it or not, but via expression profiling, you could assign expression scores which readily quantify the gene expression into various levels.

Also, better would be  to go through some reviews which will enlighten you more.

We have a thread specially for that.
Rna-Seq Review Papers

ADD REPLYlink written 3.2 years ago by Sukhdeep Singh9.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1127 users visited in the last hour