Question: how i can separate mitochondrial sequences from chromosomal sequences
gravatar for zion22
28 days ago by
zion220 wrote:

Hi, I'd like to annoy you because I need your help.

I'd like to know how you could separate mitochondrial sequences from chromosomal sequences. First of all, I have a fasta file with all the sequences of the organism and I also have a fasta file of a reference mitogenome, but I would like to create a separate file of the mitogenome and another one of the chromosome.

Thank you very much and excuse my ignorance, only I am very new in this area. so if you could tell me what steps I have to do and why, it would be very useful for me.

Thank you

sequence genome • 144 views
ADD COMMENTlink modified 27 days ago by h.mon24k • written 28 days ago by zion220

How many chromosomes are in the organism? If it's not that many you can do this with a simple samtools faidx chr1 chr2 chr3... > only_chromosomes.fa.

ADD REPLYlink modified 27 days ago • written 27 days ago by Devon Ryan88k

can i do that from a contigs.fasta file?

ADD REPLYlink written 27 days ago by zion220

Please elaborate on the data you have and how you obtained it.

ADD REPLYlink written 27 days ago by WouterDeCoster37k

ahh, sorry, I have my raw reads, also I already assembled them, so I think, I would work with the contigs or would have to work with the reads? the mitogenome reference file comes from the NCBI

ADD REPLYlink written 27 days ago by zion220
gravatar for h.mon
27 days ago by
h.mon24k wrote:

You should explain clearly what kind of raw data you have, and how did you assemble the draft genome. I will assume you only have Illumina paired-end reads and assembled the genome using just this data with any unknown short read genome assembler.

Your problem is not so simple, because unless you have a very high quality assembly (which I assumed you don't), the contigs should be really fragmented and even the mitochondrial genome hasn't been assembled into a single contig.

Create a blast database from your assembly, then use blast with the reference mitogenome as query and the draft genome as database to search for contigs with similarity to the reference mitogenome. Due to the fragmented state of your assembly and the possible existence of NUMTs, you will get more hits than you would like. Now you can either blast these contigs against NCBI NT, or you can try to assemble these contigs with CAP3, and then blast against NT. If you are lucky, you will get just one contig with high similarity over the whole sequence and of the appropriate length.

ADD COMMENTlink written 27 days ago by h.mon24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1033 users visited in the last hour