We have to find the expression level of a gene on different conditions. We have assembled both RNA-seq sample data using velvet-Oases. Now I need to find the gene from a full length CDNA library using the assembled contigs. Then I need to map the reads with the gene to find the expression level. I think I need to use BLAST for finding gene from cDNA library and then some alignment tool for finding expression level. I am just wondering if I am in right direction. If not, can you please suggest what should I do. Also what are the best tools I could use for this. I am just new to this bioinformatics, so need some help from experts.
I think this is a fairly common question. I've compiled a list of pointers that I would suggest:
If you have assembled contigs, and you have RNA Seq data, you would use an aligner, such as bowtie, to map the reads to your contigs so you can count them. Your contigs represent your "genes". Once you have counts on your genes, you can determine relative expression differences for genes between conditions using R and the edgeR or DESeq libraries (you might have to look up how to summarize counts on genes, there are several resources available), or other methods. And you can estimate relative expression levels between contigs in a single condition - but "expression level" will only be relative to other transcripts (i.e. your contigs). The edgeR library has a function for determining a normalized RPKM value, but it is simply a convenience, as RPKM can be affected by many things. To determine the actual expression level, you would have to use externally added control spikes. There is a set of 92 available from Ambion (Life Tech, ERCC Spike In Mix). Since these are present at known concentrations, you can determine a relationship between read counts and the concentration of a molecule in your experiment (i.e. expression level). Otherwise, you're just guessing.
If I follow you correctly, you would use BLAST to determine which of your assembled contigs represents your gene of interest. However, the contig itself would be the mapping target for your RNA Seq analysis, since it is what came from and is represented by your data (though this could be a sticky point).