Question: How to find find all human tRNA-sequences using Bio.Entrez?
0
gravatar for natasha.sernova
2.3 years ago by
natasha.sernova3.4k
natasha.sernova3.4k wrote:

Dear all,

Simple google search gave me the following:

http://gtrnadb.ucsc.edu/genomes/eukaryota/Hsapi19/hg19-tRNAs.fa

https://www.ncbi.nlm.nih.gov/genome/51

https://www.ncbi.nlm.nih.gov/gene/?term=tRNA%20AND%20human

But how can I do it programmically in Python and find all human tRNA-sequences using Bio.Entrez?

I’ve read biopython Cookbook. I found the following example for some Arabidopsis thaliana chromosomes.

17.2.2 Annotated Chromosomes Continuing from the previous example, let’s also show the tRNA genes. We’ll get their locations by parsing the GenBank files for the five Arabidopsis thaliana chromosomes. You’ll need to download these files from the NCBI FTP site ftp://ftp.ncbi.nlm.nih.gov/genomes/Arabidopsis_thaliana, and preserve the subdirectory names or edit the paths below:

from reportlab.lib.units import cm
from Bio import SeqIO
from Bio.Graphics import BasicChromosome

entries = [("Chr I", "CHR_I/NC_003070.gbk"),
       ("Chr II", "CHR_II/NC_003071.gbk"),
       ("Chr III", "CHR_III/NC_003074.gbk"),
       ("Chr IV", "CHR_IV/NC_003075.gbk"),
       ("Chr V", "CHR_V/NC_003076.gbk")]

max_len = 30432563 #Could compute this
telomere_length = 1000000 #For illustration

chr_diagram = BasicChromosome.Organism()
chr_diagram.page_size = (29.7*cm, 21*cm) #A4 landscape

for index, (name, filename) in enumerate(entries):
  record = SeqIO.read(filename,"genbank")
  length = len(record)
  features = [f for f in record.features if f.type=="tRNA"]
  #Record an Artemis style integer color in the feature's qualifiers,
  #1 = Black, 2 = Red, 3 = Green, 4 = blue, 5 =cyan, 6 = purple
  for f in features: f.qualifiers["color"] = [index+2]
  cur_chromosome = BasicChromosome.Chromosome(name)
  #Set the scale to the MAXIMUM length plus the two telomeres in bp,
  #want the same scale used on all five chromosomes so they can be
  #compared to each other
  cur_chromosome.scale_num = max_len + 2 * telomere_length

  #Add an opening telomere
  start = BasicChromosome.TelomereSegment()
  start.scale = telomere_length
  cur_chromosome.add(start)

  #Add a body - again using bp as the scale length here.
  body = BasicChromosome.AnnotatedChromosomeSegment(length, features)
  body.scale = length
  cur_chromosome.add(body)

  #Add a closing telomere
  end = BasicChromosome.TelomereSegment(inverted=True)
  end.scale = telomere_length
  cur_chromosome.add(end)

  #This chromosome is done
  chr_diagram.add(cur_chromosome)

chr_diagram.draw("tRNA_chrom.pdf", "Arabidopsis thaliana")

It might warn you about the labels being too close together - have a look at the forward strand (right hand side) of Chr I, but it should create a colorful PDF file.

Is it a good way for human genome? And what about Bio.Entrez?

Many thanks!

Natasha

bio.entrez trna-sequence human • 1.0k views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by natasha.sernova3.4k
1
gravatar for Neilfws
2.3 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

First you need to figure out the Entrez query to return what you want. This search term gets human tRNA identifiers from the nucleotide database:

Homo sapiens[porgn] AND biomol_trna[PROP]

Then you need to read the Bio.Entrez documentation and figure out how it implements e-search and e-fetch. The code you posted is not related to this task at all.

ADD COMMENTlink written 2.3 years ago by Neilfws48k

Thank you very much!

ADD REPLYlink written 2.3 years ago by natasha.sernova3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1323 users visited in the last hour