Hello
I have some doubts on my analysis of an 454 EST s assembly of data that come from 3 different datasets. Each dataset come from the same organism, but in different conditions (resistance to fungi).
I've assembly them separated (I've tried the all together approach to, but for that my question doesn't apply) and I have blast them all. after that I had apply blast2go on blast results. How can I discover what is important for the conditions on test?
I have tried that approach:
- compare the blast best hits from each dataset with each other and said that, if a contig have the same best blast hit than other, it is the same thing
- that way, i define a group of reads that exists in all conditions an others that exist only on some conditions
- i made the blast2go enrichments analyses for this group of sequences, but because blast2go doesn't use only the best blast hit, the sequences that I had assume to be the same don't have the same GO terms
Any suggestions on how to continue?
Do you have a reference sequence or reference transcripts for your organism?
It's not clear to me what the point of your experiment is. You have three conditions, and you want to assemble a transcriptome from piles of reads measured under each condition, for the purpose of quantifying what genes are expressed in response to that condition? If you have no reference (as Sean asked) then I think you have a distillation problem. You'll likely have to assemble them all together to create a kind of reference, then pull sequences from that for your blast2go. The key is your statement "sequences that I had assume to be the same", you have to solve this ambiguity.
Your questions is too broad, try to reduce and simplify it. No one here can easily advise you in general terms of what might be wrong with your data.
No, I don't have a reference for my organism. My 3 conditions are no fungi, and resistance to fungi a and resistance to fungi b. I think that the dataset with no fungi will be my base line. The "sequences i had assume to be the same" phrase mean that, when i try to compare the datasets to see what they have in common, I classify a sequence in dataset A to be the same as a sequence in dataset B if their best hit is the same.