Hi,
I keep getting an error segmentation fault 11 while running muscle command to align multiple sequences from a fasta file containing 112626 ESTs. Is it because of the large number of sequences that is causing the problem?
I used the command
$ ./muscle -in input.fa -out output.afa -maxiters 1 -diags1 -sv
Probably. Why in the world would you like to MSA 112626 ESTs?
Most likely yes.
Are you trying to deduplicate this dataset? If that is the case then there are programs like CD-HIT which are more appropriate.
I am preparing an exome capture library from these 112K ESTs.
Can you clarify what that means? What are you hoping to do by aligning 112K sequences? Are they all related to each other? Otherwise there is no point in trying to do MSA with them.
I have collected 320k ests from NCBI database, 47209 Ests from DDBJ and 54K Ests from Alfalfa Genome index (AGED). I am aligning these Ests together to generate a sequence which I will use as a reference sequence for Identifying the SNPs to the alfalfa lines that I have. I am working on identifying genetic variants associated with self incompatibility in alfalfa.
You are unlikely to get the results you want by following the strategy you describe here. I suggest taking a step back and thinking about why doing an MSA is not going to work in this case. Think about how you could use CD-HIT to get started.