Question

Functional annotation from assembled contigs

0

Entering edit mode

7.8 years ago

h.l.wong ▴ 70

Hi all,

I am a newbie to metagenomics and it is often very confusing on how to analyse my data. I have used the Illumina NextSeq (2 x 150) to sequence a microbial community.

I have used fastQC and trimmomatic for the quality control, and I have assembled the sequences using IDBA-UD. In IDBA-UD, I used command mink=20, maxk=100 for constructing de Bruijn Graph.

There are a lot of output files namely (contig-20.fa, contig-40.fa.....contig-100.fa, contig.fa and scaffold.fa). I would like to do functional annotations and maybe later binning.

Here are the questions:

Which file(s) should I use? I have the log file showing the statistics but I don't know what criteria should I choose upon.
What programs do you suggest for functional annotation?
I intend to use MetaBat for binning, but it needs a BAM file, how can I generate a BAM file?

Thanks for your time on reading my question, if you need anything to be clarified, please let me know.

Cheers and many thanks

Alan

blast alignment Assembly • 3.6k views

ADD COMMENT • link updated 7.8 years ago by Asaf 10k • written 7.8 years ago by h.l.wong ▴ 70

score 0 · Answer 1 · 2017-01-17

0

Entering edit mode

7.8 years ago

Asaf 10k

use scaffold.fa make sure it's the last one created
you can start with prodigal for protein prediction, maybe others will have suggestions
You can map the reads (your input) against the scaffolds using bwa, bowtie2 etc.

ADD COMMENT • link 7.8 years ago by Asaf 10k

0

Entering edit mode

Thank you Asaf.

I was in the IDBA google group but someone suggested using contig.fa as "Scaffold file gonna have lots of Ns (not useful for alignment)" https://groups.google.com/forum/#!topic/hku-idba/D8D46jDjXHE . She suggested that using contig.fa is better for performing BLAST.

Is scaffold.fa better than contig.fa for binning?

Should I use contig.fa for annotation?

Cheers

Alan

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

0

Entering edit mode

You can check how many N's you actually have in your data. I don't think it really matters for annotation (you'll get partial proteins anyway). For binning scaffolds might be more useful though.

ADD REPLY • link 7.8 years ago by Asaf 10k

0

Entering edit mode

Thanks again. How can I check? Is it in the Log file? Also, how do you determine quality of the assembly? I have got n50 of ~1000? Is it too low? If so, how can I improve the quality of the assembly?

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

0

Entering edit mode

It's pretty low... An average protein is 1000 bp long so half of the assembly will contain fragmented proteins. You can check the number of N's in the sequence itself, you can also compare N50 of the contigs to the scaffolds. You can try and assemble with metaSPAdes 3.9.0, it should give better results.

ADD REPLY • link 7.8 years ago by Asaf 10k

0

Entering edit mode

Yeah that was just from the log file of IDBA-UD.

I just ran QUAST on the scaffold.fa and it states n50=1942 while the n50 of contig.fa is 1540. This is the contigs generated from mink=20 and maxk=100.

I ran IDBA-UD again setting mink=100, maxk=121, and Quast shows that n50 of scaffold.fa rises to 3334. However at the same time number of contigs decreased 10 times. (from 52k to 6.1k)

N's per 100kbp ranged from 32 to 38 for scaffold.fa

I will try running IDBA again with mink=60 and maxk=124 to see what I can get.

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

0

Entering edit mode

I wouldn't recommend to raise mink. I still suggest to run spades

ADD REPLY • link 7.8 years ago by Asaf 10k

0

Entering edit mode

Thanks for all the suggestions.

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

0

Entering edit mode

Hi Asaf,

Is it necessary to map my reads before I use prodigal for protein prediction?

Or is mapping reads only necessary for downstream binning?

Cheers

Alan

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

0

Entering edit mode

Hi Asaf,

Is it necessary to map my reads before I use prodigal for protein prediction?

Or is mapping reads only necessary for downstream binning?

Cheers

Alan

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

score 0 · Answer 2 · 2017-01-20

0

Entering edit mode

7.8 years ago

Asaf 10k

You need the assembly for prodigal. However, there are tools that run blastx against a database and these tools can use raw reads

ADD COMMENT • link 7.8 years ago by Asaf 10k

0

Entering edit mode

Hi Asaf,

After generating .gff files from prodigal, do you know of any programs that can visualise the files for further analysis?

Cheers

Alan

ADD REPLY • link 7.8 years ago by h.l.wong ▴ 70

0

Entering edit mode

Genome viewer? IGN, IGN, jbrowse and many more

ADD REPLY • link 7.8 years ago by Asaf 10k