Question: Split fasta file into chromosomes
gravatar for natasha
3.5 years ago by
natasha100 wrote:


I have 105 bacterial isolates which are assembled into contigs. I also have a really good, fully assembled, reference genome, consisting of 2 chromosomes.

I would like to align each of my isolates to the reference genome, determine which contigs/sequence belongs to which chromosome and therefore split every fasta sequence into 2 chromosome.

I have been told that mauve is good for this. However, I have 105 isolates and mauve doens't seem to be able to cope with this much data at once. I could align smaller groups to the reference genome at a time. However, is there another way/tool to do this?


ADD COMMENTlink written 3.5 years ago by natasha100

Do you only want to split the reference into two chromosome files so you can use mauve on them separately?

ADD REPLYlink written 3.5 years ago by genomax71k

No, I know how to split my reference into two chromosome files... I need to split all of my 105 isolates into two chromosome files.

ADD REPLYlink written 3.5 years ago by natasha100

Based on what criteria? What format are the files currently in?

ADD REPLYlink written 3.5 years ago by genomax71k

Fasta format.

I either want to split the contigs or just crudely split the fasta sequence, by aligning them to the reference genome.

ADD REPLYlink written 3.5 years ago by natasha100

Have you tried to use mauve with the two chromosomes independently? Assuming there is no significant homology between the two chromosomes that would allow you to locate contigs from each isolate, which you can then split using a program called faSomeRecords from Kent Utilities. That may be a lot of mauve runs but it can work.

Other option is to try lastz, which was designed for chromosome sized sequences to identify the contigs you need.

ADD REPLYlink written 3.5 years ago by genomax71k

Sorry I'm new to this, but you know how to use the file that gives mauve from 5 to 3 to know which contigs belong to chromosome 1 and which to chromosome 2??

ADD REPLYlink written 9 months ago by alucero0

Maybe Satsuma is a good option for your task. Satsuma is a tool that reliably aligns large and complex DNA sequences providing maximum sensitivity, specificity and speed. I've used it to align contigs against well assembled related genomes and sort the contigs according to their hit location on the reference.

ADD REPLYlink written 3.5 years ago by iraun3.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 792 users visited in the last hour