I followed the Trinity/ RSEM/ EdgeR pipeline and have produced some FPKM spreadsheets for genes and isoforms.
I am currently analysing the isoforms:
I have 3 sample conditions and I am comparing two conditions.So far I have worked out Log2 fold changes between conditions to get the change in level of expression between 2 samples for all of the isoforms.
I now have a .FASTA file generated from Trinity and I wan't to extract the isoforms I am interested in (top 1000 increases in expression) between 2 samples. I have the list of sequences on Excel. Is there anyway I can pick these out of the .FASTA file and BLAST them against NCBI in one step??
So far I'm using CLCBio to load up my .Fasta file and I am filter searching for the sequences of interest then blasting the lists. As you can imagine you need to copy/paste the isoforms of interest names from excel into CLCBio search filter one by one and this takes a long time.
Ideally I'm looking for a quick and simple step or automated program which will do this for me, or is there anyway I can use the spreadsheet for a program to read off and find the sequences of interest. Copying and pasting one by one is taking too long - especially to copy and paste 1000 sequence names...
If anyone knows of a program or way to do this easier, I would love to know.