Question

extract promoter sequence from whole genome

0

Entering edit mode

3.8 years ago

citronxu ▴ 20

Hi there,

I'm right now working with Brassica napus, and would like to extract promoter sequence of certain genes on Linux platform.

First I look through the brassica napus database, Ensembl and genoscope, found the information of each predicted genes (cDNA and polypeptide sequences), yet no information on promoter regions.

Then I downloaded the whole genome info (Brassica napus_v4.1_chromosomes_fa.gz from genoscope), and intend to retrive promoter sequences from it. what I have include the genes position (for example, in which chromosome they are located and from which to which postion they span), and the whole genome sequences. What commands can I use to get to position of genes, in which flanking sequences are also shown so that I could be able to copy sequences of 1000 bps upstream the start codon.

What I tried is using 'zless' to read the file combining with command 'grep' + detail info on genes (for instance, chromosome number), but it did not work.

Welcome any recommandations and suggestions.

Many thanks in advance!

sequence • 846 views

ADD COMMENT • link 3.8 years ago by citronxu ▴ 20

1

Entering edit mode

Extracting promoters is a non-exact science.

Broadly speaking, if you want to ~1000bp upstream of every gene feature, you can do this quite easily with BioPython or similar. It would be easier to do this from a genbank or similar, where you don't need to know the feature coordinates a priori.

If you do have this information, it can still be done, but requires being a bit more direct. Can you please show what your input data actually looks like? (the coordinates file).

ADD REPLY • link 3.8 years ago by Joe 21k

0

Entering edit mode

HI, thank you for the reply. I feel so sorry for my stupid question :/, ya, after I saw you comment I then turned to check NCBI database and found exact prometer sequences recorded...

ADD REPLY • link 3.8 years ago by citronxu ▴ 20