4.3 years ago by
United States
Im very inexperienced with bioinformatics and I have a transcriptome that I am hoping to analyze. The 3 things I would like to do are find the expression of each gene, blast specific genes against the transcriptome, and I would like to have each gene of the transcriptome blasted online to find if it shares homology with any other genes. There are currently no reference transcriptomes or genomes of the organism and the only file I have is a single fasta file. I have downloaded the local blast executables for blasting specific sequences against the transcriptome, but in terms of doing expression analysis and blasting every gene in the transcriptome, Im having trouble. I have been looking into different analysis software online and found the galaxy site, but it looks like there is not much I can do with only a fasta file. It seems that most of the programs require a fastq, sam/bam, gff/gtf file so I am not sure if or how I can do any analysis with only a fasta file. 

Any ideas of what software and analysis Im able to do with this fasta file? Any advice for me for this process? 

4.3 years ago by
Bergen, Norway
Can you give an example sequence from your file? Most likely the FASTA file contains assembled transcripts made from the raw data. Certainly there is a lot you can do with 'only' a FASTA file, e.g. annotating the sequences using Blast2GO, which is easy enough to use for a beginner; but you should definitely get hold of the raw data, then you can also map the reads back to transcripts and do some quantification, e.g. using galaxy. As a general advice for someone coming new into a field is to follow the established standard, by following the methods of other published papers. You might as well replicate exactly what others have done before. Here is an example: the transcriptome of the zooplankton Calanus finmarchicus.  That way you can also determine what your data and effort is worth in terms of publication.


  • get raw data
  • annotate transcripts using blast
  • try to replicate methods of a similar paper on your data
  • adapt methods (only) if necessary
Here is an example sequence from the file :

Ok, I will see if I can get the raw data from the group that sent the sequences to me. Thanks for the tips, I will look into more papers. 

4.3 years ago by
Spain. Universidad de Córdoba
I would say that you don't provide with enough data

How much data do you have ?

How did you get your data ?

When you say you don't have fastq, do you mean you don't have quality of your reads?

Do you have data coming from one condition only?

I have a file of ~41,000 sequences at 35MB, that I had sent to me from a lab in europe. An example of a sequence from the file is in my reply above. And yes the data is from one condition. 

