Reference gene identification for contig ordering, when sample species is unknown
1
0
Entering edit mode
5.5 years ago
shuksi1984 ▴ 60

How do I choose a reference genome to order the contigs once after being assembled? in my case, I have no information about clinical isolate of the microorganism. Therefore, based on what do I choose a reference for conting ordering using Mauve. My sample is "ERR209055" downloaded from EBI and task is "antimicrobial resistance gene identification".

next-gen assembly alignment • 1.1k views
ADD COMMENT
0
Entering edit mode
5.5 years ago
gb ★ 2.2k

I checked ERR209055 and it is a human gut metagenome. So it is a mix of a lot of different species, and therefore you can not choose a reference. Why do you need to order the contigs? After assembly you need to predict the open reading frames and blast it against a antimicrobial resistance gene database.

ADD COMMENT
0
Entering edit mode

you can not choose a reference

In the above case, if I have a human sample with E.coli, how to choose the reference?

you need to predict the open reading frames

How do I do that?

ADD REPLY
0
Entering edit mode

Depends on what your goal is I think. But if it is only human and E.coli you could use human and E.coli. You could also only use E.coli if you are not interested in the human genes. I personally think that if the goal is to find antimicrobial resistane genes you dont have to worry to much about human genes. But you said you already had done a assembly.

For predicting genes there are many tools: https://en.wikipedia.org/wiki/List_of_gene_prediction_software

ADD REPLY
0
Entering edit mode

How do I choose raw data from ENA, to find anti-microbial resistance genes and multi-loci sequence typing. Can I select human isolate containing single microorganism or multiple microorganism species? I choose "SRR1060710" for both the above task. Am I doing right?

ADD REPLY
0
Entering edit mode

Just to test or to try to find those genes it does not matter if you choose data from a single microorganism or multiple microorganism (metagenome). It helps if you know upfront that the organism has shown antimicrobial resistance in other studies. Globally:

  1. download the raw fastq files
  2. Do a assembly, denovo or reference based
  3. Find the open reading frames (gene prediction)
  4. Blast it against known antimicrobial resistane genes

Most resistance proteins consist of multiple domains. You often find a gene that codes for one domain but for resistance multiple domains are needed.

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6