Question: how i can separate mitochondrial sequences from chromosomal sequences
zion22 wrote:

Hi, I'd like to annoy you because I need your help.

I'd like to know how you could separate mitochondrial sequences from chromosomal sequences. First of all, I have a fasta file with all the sequences of the organism and I also have a fasta file of a reference mitogenome, but I would like to create a separate file of the mitogenome and another one of the chromosome.

Thank you very much and excuse my ignorance, only I am very new in this area. so if you could tell me what steps I have to do and why, it would be very useful for me.

Thank you

How many chromosomes are in the organism? If it's not that many you can do this with a simple samtools faidx chr1 chr2 chr3... > only_chromosomes.fa.

can i do that from a contigs.fasta file?

Please elaborate on the data you have and how you obtained it.

ahh, sorry, I have my raw reads, also I already assembled them, so I think, I would work with the contigs or would have to work with the reads? the mitogenome reference file comes from the NCBI

h.mon wrote:

You should explain clearly what kind of raw data you have, and how did you assemble the draft genome. I will assume you only have Illumina paired-end reads and assembled the genome using just this data with any unknown short read genome assembler.

Your problem is not so simple, because unless you have a very high quality assembly (which I assumed you don't), the contigs should be really fragmented and even the mitochondrial genome hasn't been assembled into a single contig.

Create a blast database from your assembly, then use blast with the reference mitogenome as query and the draft genome as database to search for contigs with similarity to the reference mitogenome. Due to the fragmented state of your assembly and the possible existence of NUMTs, you will get more hits than you would like. Now you can either blast these contigs against NCBI NT, or you can try to assemble these contigs with CAP3, and then blast against NT. If you are lucky, you will get just one contig with high similarity over the whole sequence and of the appropriate length.

