Question: How to obtain regions in a whole genome that do not align with any genes/proteins in a blast search?
0
gravatar for mirza
15 months ago by
mirza80
India
mirza80 wrote:

Hi,

I am given a genome sequence and am asked to do blast search against the whole nr database and mark/ extract regions (sequences) that do not align with any genes/ proteins in the database from this genome. How to obtain such sequences that do not align or show homology with any of the genes/ proteins in the databases so far, from a whole genome seq? What should be my strategy? Are there any tools available?

ADD COMMENTlink modified 15 months ago by Michael Dondrup44k • written 15 months ago by mirza80
3
gravatar for Michael Dondrup
15 months ago by
Bergen, Norway
Michael Dondrup44k wrote:

Do you want to look at the whole genomic sequence or only predicted genes? Anyway, I would:

  • Blast (blastx for whole genomic sequence, blastp for predicted protein coding genes, using a sensible cutoff e.g. 1e-6 or -10)
  • IF looking for genes only, you simply select those without hits, done
  • IF looking for all genomic regions, extract the blast HSP coordinates into subject-based ranges (chr, start, end), e.g. in bed or gff format, this can be done with bioperl (preferentially) or using the tabular blast format.
  • load the regions into bed-tools or R and get all the gaps, that is regions with 0 coverage, when comparing to the chromosomes. If necessary, extend chromosome length to the real sequence length.
ADD COMMENTlink modified 15 months ago • written 15 months ago by Michael Dondrup44k

Thank you so much Michael. I have the whole genomic sequence right now.

ADD REPLYlink written 15 months ago by mirza80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1245 users visited in the last hour