mauve for contigs
1
0
Entering edit mode
14 months ago
rthapa ▴ 50

Hi, I am new to genome assembly. I have got 6 contigs after genome assembly. I wonder if I need to use all 6 contigs to merge into one with tools like mauve to get one contig or could I proceed with the longest contig for alignment with reference sequence? I would appreciate any suggestion.

Thanks

genome mauve • 752 views
0
Entering edit mode
14 months ago
hugo.avila ▴ 250

If you are talking about a bacterial genome i think i can help you. It helps if you give more details like the species name and a QUAST report. If your organism usually have plasmids it is possible that the large contig is the chromosome (if the organism has only one) and the small ones (<500Kbp) the plasmids. But maybe be that you have gaps in you genome, if is this the case you have to make a scaffold or try to do some in silico gap filling. Either way use, mummer, mauve or contigator to map your assemblie to a reference genome to identify your contigs. Here is a few similar questions that may help you:

Question: Bacterial genome assembly for comparative genomic analysis

Question: how can I do the assembly of contigs

0
Entering edit mode

Thank you for your suggestions. Yes, it is a bacterial genome from Erwinia spp. I don't have a QUAST report to share for now. I have a question regarding QUAST. From the script, it looks like the contigs should be in separate files and also it seems like we need a annotation file too. I have a single file with all contigs. I wonder if I need to separate each contig in separate file. Can we run the QUAST without annotation file?

./quast.py test_data/contigs_1.fasta \
test_data/contigs_2.fasta \
-r test_data/reference.fasta.gz \
-g test_data/genes.gff

0
Entering edit mode

You do not need separated files just pass your contig file to quast. You do not need a reference as well but is cool to use one:

./quast.py -r referece.fasta your_contigs.fasta


If you have problems running quast, you can use it through the browser.

0
Entering edit mode

Thank you for the suggestion. That is really helpful. I have QUAST report. How could we decide if we need to make a scaffold from the contigs or just proceed with the largest contig? I checked among the contigs and found that the smallest contig is a plasmid.

Assembly    ea.contigs
# contigs (>= 0 bp) 5
# contigs (>= 1000 bp)  5
# contigs (>= 5000 bp)  5
# contigs (>= 10000 bp) 5
# contigs (>= 25000 bp) 4
# contigs (>= 50000 bp) 2
Total length (>= 0 bp)  3894539
Total length (>= 1000 bp)   3894539
Total length (>= 5000 bp)   3894539
Total length (>= 10000 bp)  3894539
Total length (>= 25000 bp)  3872667
Total length (>= 50000 bp)  3782610
# contigs   5
Largest contig  2238584
Total length    3894539
Reference length    3833832
GC (%)  53.58
Reference GC (%)    53.58
N50 2238584
NG50    2238584
N75 1544026
NG75    1544026
L50 1
LG50    1
L75 2
LG75    2
# misassemblies 11
# misassembled contigs  3
Misassembled contigs length 3804482
# local misassemblies   24
# scaffold gap ext. mis.    0
# scaffold gap loc. mis.    0
# unaligned mis. contigs    0
# unaligned contigs 1 + 3 part
Unaligned length    73700
Genome fraction (%) 98.952
Duplication ratio   1.007
# N's per 100 kbp   0.00
# mismatches per 100 kbp    125.92
# indels per 100 kbp    345.55
Largest alignment   885743
Total aligned length    3818763
NA50    736060
NGA50   736060
NA75    650581
NGA75   650581
LA50    3
LGA50   3
LA75    4
LGA75   4

0
Entering edit mode

Well, if you only want the chromosome, you must map the contigs in your reference and extract those that map to it. You will probably map only the largest contig. If you map more than the largest contig, you will have to build a chromosomal scaffold, as you have so few contigs and good qualities this will be very easy, just map the contigs onto the reference how this person did. But do not throw away contigs that do not map, they are large and of good quality and may have valuable data in them, such as resistance and virulence genes.