Question: Functional annotation from assembled contigs
0
gravatar for h.l.wong
3.7 years ago by
h.l.wong60
Australia
h.l.wong60 wrote:

Hi all,

I am a newbie to metagenomics and it is often very confusing on how to analyse my data. I have used the Illumina NextSeq (2 x 150) to sequence a microbial community.

I have used fastQC and trimmomatic for the quality control, and I have assembled the sequences using IDBA-UD. In IDBA-UD, I used command mink=20, maxk=100 for constructing de Bruijn Graph.

There are a lot of output files namely (contig-20.fa, contig-40.fa.....contig-100.fa, contig.fa and scaffold.fa). I would like to do functional annotations and maybe later binning.

Here are the questions:

  1. Which file(s) should I use? I have the log file showing the statistics but I don't know what criteria should I choose upon.
  2. What programs do you suggest for functional annotation?
  3. I intend to use MetaBat for binning, but it needs a BAM file, how can I generate a BAM file?

Thanks for your time on reading my question, if you need anything to be clarified, please let me know.

Cheers and many thanks

Alan

blast alignment assembly • 2.0k views
ADD COMMENTlink modified 3.7 years ago by Asaf8.4k • written 3.7 years ago by h.l.wong60
0
gravatar for Asaf
3.7 years ago by
Asaf8.4k
Israel
Asaf8.4k wrote:
  1. use scaffold.fa make sure it's the last one created
  2. you can start with prodigal for protein prediction, maybe others will have suggestions
  3. You can map the reads (your input) against the scaffolds using bwa, bowtie2 etc.
ADD COMMENTlink written 3.7 years ago by Asaf8.4k

Thank you Asaf.

I was in the IDBA google group but someone suggested using contig.fa as "Scaffold file gonna have lots of Ns (not useful for alignment)" https://groups.google.com/forum/#!topic/hku-idba/D8D46jDjXHE . She suggested that using contig.fa is better for performing BLAST.

Is scaffold.fa better than contig.fa for binning?

Should I use contig.fa for annotation?

Cheers

Alan

ADD REPLYlink written 3.7 years ago by h.l.wong60

You can check how many N's you actually have in your data. I don't think it really matters for annotation (you'll get partial proteins anyway). For binning scaffolds might be more useful though.

ADD REPLYlink written 3.7 years ago by Asaf8.4k

Thanks again. How can I check? Is it in the Log file? Also, how do you determine quality of the assembly? I have got n50 of ~1000? Is it too low? If so, how can I improve the quality of the assembly?

ADD REPLYlink written 3.7 years ago by h.l.wong60

It's pretty low... An average protein is 1000 bp long so half of the assembly will contain fragmented proteins. You can check the number of N's in the sequence itself, you can also compare N50 of the contigs to the scaffolds. You can try and assemble with metaSPAdes 3.9.0, it should give better results.

ADD REPLYlink written 3.7 years ago by Asaf8.4k

Yeah that was just from the log file of IDBA-UD.

I just ran QUAST on the scaffold.fa and it states n50=1942 while the n50 of contig.fa is 1540. This is the contigs generated from mink=20 and maxk=100.

I ran IDBA-UD again setting mink=100, maxk=121, and Quast shows that n50 of scaffold.fa rises to 3334. However at the same time number of contigs decreased 10 times. (from 52k to 6.1k)

N's per 100kbp ranged from 32 to 38 for scaffold.fa

I will try running IDBA again with mink=60 and maxk=124 to see what I can get.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by h.l.wong60

I wouldn't recommend to raise mink. I still suggest to run spades

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Asaf8.4k

Thanks for all the suggestions.

ADD REPLYlink written 3.7 years ago by h.l.wong60

Hi Asaf,

Is it necessary to map my reads before I use prodigal for protein prediction?

Or is mapping reads only necessary for downstream binning?

Cheers

Alan

ADD REPLYlink written 3.7 years ago by h.l.wong60

Hi Asaf,

Is it necessary to map my reads before I use prodigal for protein prediction?

Or is mapping reads only necessary for downstream binning?

Cheers

Alan

ADD REPLYlink written 3.7 years ago by h.l.wong60
0
gravatar for Asaf
3.7 years ago by
Asaf8.4k
Israel
Asaf8.4k wrote:

You need the assembly for prodigal. However, there are tools that run blastx against a database and these tools can use raw reads

ADD COMMENTlink written 3.7 years ago by Asaf8.4k

Hi Asaf,

After generating .gff files from prodigal, do you know of any programs that can visualise the files for further analysis?

Cheers

Alan

ADD REPLYlink written 3.7 years ago by h.l.wong60

Genome viewer? IGN, IGN, jbrowse and many more

ADD REPLYlink written 3.7 years ago by Asaf8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1817 users visited in the last hour