Multiple Sequence Alignment using muscle
1
0
Entering edit mode
4.8 years ago

Hi,

I keep getting an error segmentation fault 11 while running muscle command to align multiple sequences from a fasta file containing 112626 ESTs. Is it because of the large number of sequences that is causing the problem?

I used the command

$ ./muscle -in input.fa -out output.afa -maxiters 1 -diags1 -sv
alignment muscle • 2.0k views
ADD COMMENT
0
Entering edit mode

Probably. Why in the world would you like to MSA 112626 ESTs?

ADD REPLY
0
Entering edit mode

Most likely yes.

Are you trying to deduplicate this dataset? If that is the case then there are programs like CD-HIT which are more appropriate.

ADD REPLY
0
Entering edit mode

I am preparing an exome capture library from these 112K ESTs.

ADD REPLY
0
Entering edit mode

Can you clarify what that means? What are you hoping to do by aligning 112K sequences? Are they all related to each other? Otherwise there is no point in trying to do MSA with them.

ADD REPLY
0
Entering edit mode

I have collected 320k ests from NCBI database, 47209 Ests from DDBJ and 54K Ests from Alfalfa Genome index (AGED). I am aligning these Ests together to generate a sequence which I will use as a reference sequence for Identifying the SNPs to the alfalfa lines that I have. I am working on identifying genetic variants associated with self incompatibility in alfalfa.

ADD REPLY
1
Entering edit mode

You are unlikely to get the results you want by following the strategy you describe here. I suggest taking a step back and thinking about why doing an MSA is not going to work in this case. Think about how you could use CD-HIT to get started.

ADD REPLY
2
Entering edit mode
4.8 years ago
h.mon 35k

You have been told repeatedly your approach is flawed, but you insist in following it. Please reconsider, and try to take a different approach: there are several papers describing SNP prospection from ESTs, why don't you follow one of them? The general workflow is to assemble the ESTs (generally with CAP3), then map the ESTs to the assembly and detect polymorphisms. Here are some links with references:

ESTs and putative line-specific (broiler and layer) SNPs identified in genes expressed in Gallus gallus pituitary and hypothalamus

Mining SNPs From EST Databases

Identification and mapping of SNPs from ESTs in sunflower

ADD COMMENT
0
Entering edit mode

Thank you for the suggestions. I will consider changing my strategy.

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6