Transcriptome Analysis with only a fasta file
2
0
Entering edit mode
9.4 years ago
coreyhowe99 ▴ 30

I'm very inexperienced with bioinformatics and I have a transcriptome that I am hoping to analyze. The 3 things I would like to do are find the expression of each gene, blast specific genes against the transcriptome, and I would like to have each gene of the transcriptome blasted online to find if it shares homology with any other genes. There are currently no reference transcriptomes or genomes of the organism and the only file I have is a single fasta file. I have downloaded the local blast executables for blasting specific sequences against the transcriptome, but in terms of doing expression analysis and blasting every gene in the transcriptome, I'm having trouble. I have been looking into different analysis software online and found the galaxy site, but it looks like there is not much I can do with only a fasta file. It seems that most of the programs require a fastq, sam/bam, gff/gtf file so I am not sure if or how I can do any analysis with only a fasta file.

Any ideas of what software and analysis I'm able to do with this fasta file? Any advice for me for this process?

rna-seq • 4.7k views
ADD COMMENT
0
Entering edit mode
9.4 years ago
Michael 54k

Can you give an example sequence from your file? Most likely the FASTA file contains assembled transcripts made from the raw data. Certainly there is a lot you can do with 'only' a FASTA file, e.g. annotating the sequences using Blast2GO, which is easy enough to use for a beginner; but you should definitely get hold of the raw data, then you can also map the reads back to transcripts and do some quantification, e.g. using galaxy. As a general advice for someone coming new into a field is to follow the established standard, by following the methods of other published papers. You might as well replicate exactly what others have done before. Here is an example: the transcriptome of the zooplankton Calanus finmarchicus. That way you can also determine what your data and effort is worth in terms of publication.

Summary:

  • get raw data
  • annotate transcripts using blast
  • try to replicate methods of a similar paper on your data
  • adapt methods (only) if necessary
ADD COMMENT
0
Entering edit mode

Here is an example sequence from the file :

>C100000_a_3_0_l_322
CACTTTGCAACAGAAACAATATGGTCGATATCTCCGCGTACAATGCATTCGTTCTATGGACTTCAATCAACCCTGGTTGGAATGGAAACAAACTGACTAAAGGAAGAAGTCTCTTATAAGCGCGTATATTACATCAAGAAAATATATGCCAAGAACTGAGGAGTCGCGTAATATTGTAATGAAAATGCAACAAGTAAATCAAGTCGTTCCTTCAGGATCTACTACGACGACTAATACTAAACGTGCTTGTTGTTTGCCCATGAAGCCATGACAGCAGAACAAATATTTTGTAGAAATTGTGAGAAACACATTTGTAACTCGC

Ok, I will see if I can get the raw data from the group that sent the sequences to me. Thanks for the tips, I will look into more papers.

ADD REPLY
0
Entering edit mode
9.4 years ago

I would say that you don't provide with enough data

How much data do you have ?

How did you get your data ?

When you say you don't have fastq, do you mean you don't have quality of your reads?

Do you have data coming from one condition only?

ADD COMMENT
0
Entering edit mode

I have a file of ~41,000 sequences at 35MB, that I had sent to me from a lab in europe. An example of a sequence from the file is in my reply above. And yes the data is from one condition.

ADD REPLY

Login before adding your answer.

Traffic: 2203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6