Question: How to obtain regions in a whole genome that do not align with any genes/proteins in a blast search?
0
gravatar for mirza
21 months ago by
mirza80
India
mirza80 wrote:

Hi,

I am given a genome sequence and am asked to do blast search against the whole nr database and mark/ extract regions (sequences) that do not align with any genes/ proteins in the database from this genome. How to obtain such sequences that do not align or show homology with any of the genes/ proteins in the databases so far, from a whole genome seq? What should be my strategy? Are there any tools available?

ADD COMMENTlink modified 21 months ago by Michael Dondrup45k • written 21 months ago by mirza80
3
gravatar for Michael Dondrup
21 months ago by
Bergen, Norway
Michael Dondrup45k wrote:

Do you want to look at the whole genomic sequence or only predicted genes? Anyway, I would:

  • Blast (blastx for whole genomic sequence, blastp for predicted protein coding genes, using a sensible cutoff e.g. 1e-6 or -10)
  • IF looking for genes only, you simply select those without hits, done
  • IF looking for all genomic regions, extract the blast HSP coordinates into subject-based ranges (chr, start, end), e.g. in bed or gff format, this can be done with bioperl (preferentially) or using the tabular blast format.
  • load the regions into bed-tools or R and get all the gaps, that is regions with 0 coverage, when comparing to the chromosomes. If necessary, extend chromosome length to the real sequence length.
ADD COMMENTlink modified 21 months ago • written 21 months ago by Michael Dondrup45k

Thank you so much Michael. I have the whole genomic sequence right now.

ADD REPLYlink written 21 months ago by mirza80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1600 users visited in the last hour