Question: Multiple Sequence Alignment using muscle
0
gravatar for atitparajuli2018
8 weeks ago by
atitparajuli20180 wrote:

Hi,

I keep getting an error segmentation fault 11 while running muscle command to align multiple sequences from a fasta file containing 112626 ESTs. Is it because of the large number of sequences that is causing the problem?

I used the command

$ ./muscle -in input.fa -out output.afa -maxiters 1 -diags1 -sv
muscle alignment • 204 views
ADD COMMENTlink modified 6 weeks ago by h.mon27k • written 8 weeks ago by atitparajuli20180

Probably. Why in the world would you like to MSA 112626 ESTs?

ADD REPLYlink written 8 weeks ago by Asaf6.1k

Most likely yes.

Are you trying to deduplicate this dataset? If that is the case then there are programs like CD-HIT which are more appropriate.

ADD REPLYlink written 8 weeks ago by genomax70k

I am preparing an exome capture library from these 112K ESTs.

ADD REPLYlink written 8 weeks ago by atitparajuli20180

Can you clarify what that means? What are you hoping to do by aligning 112K sequences? Are they all related to each other? Otherwise there is no point in trying to do MSA with them.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax70k

I have collected 320k ests from NCBI database, 47209 Ests from DDBJ and 54K Ests from Alfalfa Genome index (AGED). I am aligning these Ests together to generate a sequence which I will use as a reference sequence for Identifying the SNPs to the alfalfa lines that I have. I am working on identifying genetic variants associated with self incompatibility in alfalfa.

ADD REPLYlink written 8 weeks ago by atitparajuli20180

You are unlikely to get the results you want by following the strategy you describe here. I suggest taking a step back and thinking about why doing an MSA is not going to work in this case. Think about how you could use CD-HIT to get started.

ADD REPLYlink written 8 weeks ago by genomax70k
1
gravatar for h.mon
6 weeks ago by
h.mon27k
Brazil
h.mon27k wrote:

You have been told repeatedly your approach is flawed, but you insist in following it. Please reconsider, and try to take a different approach: there are several papers describing SNP prospection from ESTs, why don't you follow one of them? The general workflow is to assemble the ESTs (generally with CAP3), then map the ESTs to the assembly and detect polymorphisms. Here are some links with references:

ESTs and putative line-specific (broiler and layer) SNPs identified in genes expressed in Gallus gallus pituitary and hypothalamus

Mining SNPs From EST Databases

Identification and mapping of SNPs from ESTs in sunflower

ADD COMMENTlink written 6 weeks ago by h.mon27k

Thank you for the suggestions. I will consider changing my strategy.

ADD REPLYlink written 5 weeks ago by atitparajuli20180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 973 users visited in the last hour