How to obtain regions in a whole genome that do not align with any genes/proteins in a blast search?
1
0
Entering edit mode
4.6 years ago
mirza ▴ 140

Hi,

I am given a genome sequence and am asked to do blast search against the whole nr database and mark/ extract regions (sequences) that do not align with any genes/ proteins in the database from this genome. How to obtain such sequences that do not align or show homology with any of the genes/ proteins in the databases so far, from a whole genome seq? What should be my strategy? Are there any tools available?

whole genome alignment unmapped blast • 1.5k views
ADD COMMENT
3
Entering edit mode
4.6 years ago

Do you want to look at the whole genomic sequence or only predicted genes? Anyway, I would:

  • Blast (blastx for whole genomic sequence, blastp for predicted protein coding genes, using a sensible cutoff e.g. 1e-6 or -10)
  • IF looking for genes only, you simply select those without hits, done
  • IF looking for all genomic regions, extract the blast HSP coordinates into subject-based ranges (chr, start, end), e.g. in bed or gff format, this can be done with bioperl (preferentially) or using the tabular blast format.
  • load the regions into bed-tools or R and get all the gaps, that is regions with 0 coverage, when comparing to the chromosomes. If necessary, extend chromosome length to the real sequence length.
ADD COMMENT
0
Entering edit mode

Thank you so much Michael. I have the whole genomic sequence right now.

ADD REPLY

Login before adding your answer.

Traffic: 1881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6