Question: Best Approach To Find Non-Human Sequences In Human Ests
10.2 years ago by
Sydney, Australia
Several years ago, I was involved with a project to detect putative archaeal sequences in human sequence data. Unfortunately, as is often the case in academia, the database described in the publication was not maintained when I moved to a new job.

This type of study has perhaps been superseded by the Human Microbiome Project. However, I'm still interested in methods to detect potentially-interesting "contaminant" sequences in public databases of sequences that are (supposedly) from one organism.

Originally, we used BLAT to search the human EST database using archaeal genes (from complete genome sequences) as the query. My questions are:

  1. Was using these 2 databases the best approach, or are there better sources?
  2. Would you use BLAT today for this type of task, or a different tool?
@neilfws: Any luck finding an answer?

10.2 years ago by
Fred Fleche4.3k
Paris, France
CleanEST: a database of cleansed EST libraries

Interesting. Although when I go to download at their website, it seems that there are the same number of "pre-cleansed" human ESTs as "cleansed".

9.2 years ago by
Martin A Hansen3.0k
I guess searching hgEST with genes from complete archaeal genomes is just fine. One could perhaps expand the number of archaeal genes by including genes from draft genomes or any archaeal genes from Genbank. Genes could be simple predictions from Glimmer or Prodigal. ESTs and genes could be pre-clustered before searching (see next):

For the second question I will suggest Uclust.

