Question: Extract Multiple Genes From Multiple Fasta Files
gravatar for hosseinv
7.2 years ago by
hosseinv20 wrote:

Hi every one,

I was wondering if somebody could tell me how to extract a number of genes from a number of fastafiles (all having the same set of genes) and CONCATENATE them altogether for each fasta file?

Having said that, I've got an excel sheet (or a text file) having the entities (the Start / End nucleotide positions) for every single gene.

I would like to do it using awk and grep in Unix.

Any help is appreciated. Cheers, Hossein

bioinformatics • 1.9k views
ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by hosseinv20

If I understand it correctly, think generating a bed file from your text file and after that bedtools getfasta associate with for loop may be a shot.

ADD REPLYlink written 7.2 years ago by AndreiR260
gravatar for Asaf
7.2 years ago by
Asaf8.1k wrote:

Hi Hossein, I think that your best shot will be using Galaxy, you can do all sorts of fasta files manipulations without programming or scripting. Good luck

ADD COMMENTlink written 7.2 years ago by Asaf8.1k

Thanks Asaf for your reply. The thing is I am currently working on a small dataset and I want to do this as an example for my next bigger dataset. So I don't really want to go through uploading big data files to Galaxy.

ADD REPLYlink written 7.1 years ago by hosseinv20

so perl or python can give you the solution. In my opinion, if you don't master awk it's easier to learn perl or python and implement these small scripts than doing it in awk

ADD REPLYlink written 7.1 years ago by Asaf8.1k

Glad to hear that. I'll have start learning Perl or Python. So would you recommend me a webpage or textbook for learning Perl please? I appreciate your help. Regards. Hossein

ADD REPLYlink written 7.0 years ago by hosseinv20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1805 users visited in the last hour