Question: How to generate Artemis ACT comparison file of multiple sequence alignment
0
gravatar for rosies
5.4 years ago by
rosies0
Australia
rosies0 wrote:

Hi there,

I was wondering how you would generate a comparison file for a single multiple sequence alignment file to be used with Artemis ACT.

Thanks,

Rosie

msa comparison file artemis act • 6.8k views
ADD COMMENTlink modified 2.5 years ago by Joe18k • written 5.4 years ago by rosies0
0
gravatar for dadarotimi
2.5 years ago by
dadarotimi10
dadarotimi10 wrote:

It's a shame that the two known web-based resource (webact and double act) created for generating comparison files are no longer working. You can find your way around it if you have a computer that works in a unix-like environment (MacBook or Linux Ubuntu).

In a terminal, typed the following command:

sudo apt-get install ncbi-blast+ (to downloaded Blast)

makeblastdb -in Genome1 -dbtype nucl

blastn -query Genome2 -db Genome1 -evalue 1 -task megablast -outfmt 6 > Genome1_Genome2.crunch

make sure you have assembled genomes that you want to compare in your working directory

This link might also help: https://katholtlab.files.wordpress.com/2017/07/comparativegenomicstutorialv2.pdf

ADD COMMENTlink written 2.5 years ago by dadarotimi10
0
gravatar for Joe
2.5 years ago by
Joe18k
United Kingdom
Joe18k wrote:

I wrote a script to do this some time ago as all the webservers are defunct.

An Artemis Comparison Tool file is simply a tabular blast output file.

NB, contrary to the title and tags of this question, ACT doesn’t handle multiple sequence alignments. It will only do pairwise alignments, thought you can visualise multiple pairs (e.g. A vs B, B vs C, and each pair will need it’s own comparison file.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Joe18k

Hi all,

I tried to use this script for to compare two fasta files that contain around 60 contigs each. When I input this into ACT I only seem to get similarity information for the first contig? Both fasta files are the same bacterial species so expected to be pretty similar.

Do you have any advice on getting this to work? I just need to know if the two isolates are the same as they were isolated from the same person.

Thanks in advance!

Charlotte

ADD REPLYlink written 12 months ago by c.e.chong20
1

I actually have an updated version of this script which can do all the necessary concatenation for you: https://github.com/jrjhealey/Oread

its still pretty raw, so its not the easiest thing in the world to install at the minute

ADD REPLYlink modified 12 months ago • written 12 months ago by Joe18k

Thank you very much for your help! This makes so much sense now

ADD REPLYlink written 12 months ago by c.e.chong20

Hi Joe, Thanks very much for making the Oread program. I'm having the same issue as Charlotte (above) with only the first node in the fasta files aligning properly. I downloaded the Oread and my file alignment, but I'm till getting the same result. Is there another version of the program that does the concatenating of the nodes in my fasta file? So sorry to bother you! This is my first go at analysing WGS files.
Thanks in advance for your time and any pointers you might send my way!

ADD REPLYlink written 11 months ago by marta0

Oread itself should do concatenation if it detects that it's necessary. The program is still not very mature though so its quite possible there are bugs or similar.

In essence all the tool does is:

  1. Read fasta 1, check if its a multi-fasta
    • If it is, make a concatenated temporary file
  2. Read fasta 2, check if its a multi-fasta
    • If it is, make a concatenated temporary file
  3. Take the input files (if single) or the temporary files, and create a BLAST tabular output (which is the ACT file).

If it still isn't cooperating, please feel free to open an Issue on github with your test data and I'll see if I can figure out what's happening.

ADD REPLYlink modified 11 months ago • written 11 months ago by Joe18k

Hi, yeah this is a known issue and limitation of ACT. It cannot process 'multiblast files'. I.e. you need contiguous sequences (at least one of them).

IIRC, you can have 1 complete reference sequence, and the other can be a multi fasta, but they cannot both be mulltifastas.

The best way to get around this is reorder your contigs relative to a known (ideally closed) reference (e.g. using progressiveMauve), then artificially concatenate the contigs together so there is only one fasta header.

This wont affect the data/visualisation.

ADD REPLYlink written 12 months ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1768 users visited in the last hour