Question

Multiple Sequence Alignment using muscle

0

Entering edit mode

6.0 years ago

atitparajuli2018 • 0

Hi,

I keep getting an error segmentation fault 11 while running muscle command to align multiple sequences from a fasta file containing 112626 ESTs. Is it because of the large number of sequences that is causing the problem?

I used the command

$ ./muscle -in input.fa -out output.afa -maxiters 1 -diags1 -sv

alignment muscle • 2.7k views

ADD COMMENT • link updated 5.9 years ago by h.mon 35k • written 6.0 years ago by atitparajuli2018 • 0

0

Entering edit mode

Probably. Why in the world would you like to MSA 112626 ESTs?

ADD REPLY • link 6.0 years ago by Asaf 10k

0

Entering edit mode

Most likely yes.

Are you trying to deduplicate this dataset? If that is the case then there are programs like CD-HIT which are more appropriate.

ADD REPLY • link 6.0 years ago by GenoMax 151k

0

Entering edit mode

I am preparing an exome capture library from these 112K ESTs.

ADD REPLY • link 6.0 years ago by atitparajuli2018 • 0

0

Entering edit mode

Can you clarify what that means? What are you hoping to do by aligning 112K sequences? Are they all related to each other? Otherwise there is no point in trying to do MSA with them.

ADD REPLY • link 6.0 years ago by GenoMax 151k

0

Entering edit mode

I have collected 320k ests from NCBI database, 47209 Ests from DDBJ and 54K Ests from Alfalfa Genome index (AGED). I am aligning these Ests together to generate a sequence which I will use as a reference sequence for Identifying the SNPs to the alfalfa lines that I have. I am working on identifying genetic variants associated with self incompatibility in alfalfa.

ADD REPLY • link 6.0 years ago by atitparajuli2018 • 0

1

Entering edit mode

You are unlikely to get the results you want by following the strategy you describe here. I suggest taking a step back and thinking about why doing an MSA is not going to work in this case. Think about how you could use CD-HIT to get started.

ADD REPLY • link 6.0 years ago by GenoMax 151k

score 2 · Answer 1 · 2019-07-11

You have been told repeatedly your approach is flawed, but you insist in following it. Please reconsider, and try to take a different approach: there are several papers describing SNP prospection from ESTs, why don't you follow one of them? The general workflow is to assemble the ESTs (generally with CAP3), then map the ESTs to the assembly and detect polymorphisms. Here are some links with references:

ESTs and putative line-specific (broiler and layer) SNPs identified in genes expressed in Gallus gallus pituitary and hypothalamus

Mining SNPs From EST Databases

Identification and mapping of SNPs from ESTs in sunflower