Question: How to extract all promoter regions in multi-fasta format from genome using GFF?
1
gravatar for rimgubaev
9 months ago by
rimgubaev70
rimgubaev70 wrote:

Hi Everyone,

How can I extract promoter sequences (ca. 1000bp upstream TSS) in multi-fasta format from genome (also multi-fasta with scaffolds) using information from corresponding GFF file? I've already tried to use GFF-Ex tool, however it didn't help (finished with errors). It is tobacco genome (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/715/135/GCF_000715135.1_Ntab-TN90/).

Does anyone know some other tools for this?

Thanks,

promoter gff fasta genome • 608 views
ADD COMMENTlink modified 4 months ago • written 9 months ago by rimgubaev70
4
gravatar for rimgubaev
4 months ago by
rimgubaev70
rimgubaev70 wrote:

Finally, I've solved this problem by combining samtools, bedtools as well as custom R script. The pipline placed into bash script is available here.

ADD COMMENTlink modified 3 months ago • written 4 months ago by rimgubaev70
0
gravatar for shoujun.gu
9 months ago by
shoujun.gu340
Rockville/MD
shoujun.gu340 wrote:
  1. extract the gene id from GFF file
  2. fetch the promoter sequence from BioMart by using the gene id you extracted
ADD COMMENTlink written 9 months ago by shoujun.gu340

BioMart on Ensembl only appears to have Nicotiana attenuata genome but not the one OP likely wants.

ADD REPLYlink written 9 months ago by genomax58k

genomax is right there are no Nicotiana tabacum data on Ensembl.

ADD REPLYlink modified 9 months ago • written 9 months ago by rimgubaev70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour