Question: Multiple Sequence Alignment using muscle
0
gravatar for atitparajuli2018
17 months ago by
atitparajuli20180 wrote:

Hi,

I keep getting an error segmentation fault 11 while running muscle command to align multiple sequences from a fasta file containing 112626 ESTs. Is it because of the large number of sequences that is causing the problem?

I used the command

$ ./muscle -in input.fa -out output.afa -maxiters 1 -diags1 -sv
muscle alignment • 475 views
ADD COMMENTlink modified 16 months ago by h.mon31k • written 17 months ago by atitparajuli20180

Probably. Why in the world would you like to MSA 112626 ESTs?

ADD REPLYlink written 17 months ago by Asaf8.4k

Most likely yes.

Are you trying to deduplicate this dataset? If that is the case then there are programs like CD-HIT which are more appropriate.

ADD REPLYlink written 17 months ago by genomax92k

I am preparing an exome capture library from these 112K ESTs.

ADD REPLYlink written 17 months ago by atitparajuli20180

Can you clarify what that means? What are you hoping to do by aligning 112K sequences? Are they all related to each other? Otherwise there is no point in trying to do MSA with them.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax92k

I have collected 320k ests from NCBI database, 47209 Ests from DDBJ and 54K Ests from Alfalfa Genome index (AGED). I am aligning these Ests together to generate a sequence which I will use as a reference sequence for Identifying the SNPs to the alfalfa lines that I have. I am working on identifying genetic variants associated with self incompatibility in alfalfa.

ADD REPLYlink written 17 months ago by atitparajuli20180

You are unlikely to get the results you want by following the strategy you describe here. I suggest taking a step back and thinking about why doing an MSA is not going to work in this case. Think about how you could use CD-HIT to get started.

ADD REPLYlink written 17 months ago by genomax92k
1
gravatar for h.mon
16 months ago by
h.mon31k
Brazil
h.mon31k wrote:

You have been told repeatedly your approach is flawed, but you insist in following it. Please reconsider, and try to take a different approach: there are several papers describing SNP prospection from ESTs, why don't you follow one of them? The general workflow is to assemble the ESTs (generally with CAP3), then map the ESTs to the assembly and detect polymorphisms. Here are some links with references:

ESTs and putative line-specific (broiler and layer) SNPs identified in genes expressed in Gallus gallus pituitary and hypothalamus

Mining SNPs From EST Databases

Identification and mapping of SNPs from ESTs in sunflower

ADD COMMENTlink written 16 months ago by h.mon31k

Thank you for the suggestions. I will consider changing my strategy.

ADD REPLYlink written 16 months ago by atitparajuli20180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2005 users visited in the last hour