How to blast against published transcriptome data
1
0
Entering edit mode
8.8 years ago
Yongjie Zhang ▴ 110

Dear All,

For the fungus I study, transcriptome data were submitted to NCBI by other researchers. Now I want to investigate if some genes are expressed or not, therefore, do you know how to blast against the transcriptome data?

I know that we can blast against a genome online in the NCBI website (http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome), but if we can blast against a transcriptome in a similar way? If no, how to download the transcriptome data in order to perform local blast. This is the information I get from the literature: "The RNA-seq expression dataset is available at the NCBI's Gene Expression Omnibus under the accession code GSE28001."

Thanks in advance for any explanations.

Yongjie

transcriptome blast • 7.7k views
ADD COMMENT
0
Entering edit mode

I just checked, I think no gene expression tables are submitted by them. If you just want to keep it simple and clean, best would be to email authors and ask for the count/FPKM files.

ADD REPLY
1
Entering edit mode
8.8 years ago

You don't need to perform BLAST for that. If they have not submitted the expresison tables, you need to download the raw data and convert to the gene expression or count tables.

First step would be to
How to download raw sequence data from GEO/SRA

Second would be to process the data using something like
Searching For A Simple Yet Powerful Workflow For Rna-Seq

Then, it's just a table that you can explore any gene for expression that you want.

ADD COMMENT
0
Entering edit mode

Thanks. But what I want to know is if certain genes are transcribed or not. I do not care their expression amounts. Do you know if there is a blast-like approach to search against assembled transcripts data (although I'm not sure if there is such data)?

ADD REPLY
0
Entering edit mode

What I don't understand is why you want to use a blast like approach. I believe what has been published in the above study is expression profiles or a RNA-Seq dataset, which consists of reverse transcribed RNA short fragments. These are then mapped to the genome ("transcriptome") and then the number of fragments mapped to a locus is counted and converted to an expression value. Based on this expression value, you can say if agene is expressed or not. When you say blast, what I can think of, is you have a short read sequenced dataset that you can map (blast) to transcriptome and generate the same expression counts which will tell you a gene is expressed or not, mere blasting will just tell you the identity and %age of match.

ADD REPLY
0
Entering edit mode

I see. People generally use no. of raw reads to measure if a gene is expressed, but why do not assemble raw reads and then blast against the assembled transcripts? Is the latter way feasible? Thanks for any comments.

ADD REPLY
0
Entering edit mode

The short reads which are sequenced are from millions of cells to say. So, you would be getting same information more than once which also becomes the basis for quantifying gene expression. A part of what you said is right, raw reads are assembled (mapped) to a genome and a transcriptome (which consists of exon-exon junctions) and then the read pileup tells you how much it is expressed. The way you are saying has two problems, I reckon.

  1. Its hard to differentiate between noise (sequencing artifacts etc) and real reads
  2. This will only give us if a gene is expressed or not, is the BLAST matches it or not, but via expression profiling, you could assign expression scores which readily quantify the gene expression into various levels.

Also, better would be to go through some reviews which will enlighten you more.

We have a thread specially for that. Rna-Seq Review Papers

ADD REPLY

Login before adding your answer.

Traffic: 1451 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6