Hi there,
I'm right now working with Brassica napus, and would like to extract promoter sequence of certain genes on Linux platform.
First I look through the brassica napus database, Ensembl and genoscope, found the information of each predicted genes (cDNA and polypeptide sequences), yet no information on promoter regions.
Then I downloaded the whole genome info (Brassica napus_v4.1_chromosomes_fa.gz from genoscope), and intend to retrive promoter sequences from it. what I have include the genes position (for example, in which chromosome they are located and from which to which postion they span), and the whole genome sequences. What commands can I use to get to position of genes, in which flanking sequences are also shown so that I could be able to copy sequences of 1000 bps upstream the start codon.
What I tried is using 'zless' to read the file combining with command 'grep' + detail info on genes (for instance, chromosome number), but it did not work.
Welcome any recommandations and suggestions.
Many thanks in advance!
Extracting promoters is a non-exact science.
Broadly speaking, if you want to ~1000bp upstream of every gene feature, you can do this quite easily with BioPython or similar. It would be easier to do this from a genbank or similar, where you don't need to know the feature coordinates a priori.
If you do have this information, it can still be done, but requires being a bit more direct. Can you please show what your input data actually looks like? (the coordinates file).
HI, thank you for the reply. I feel so sorry for my stupid question :/, ya, after I saw you comment I then turned to check NCBI database and found exact prometer sequences recorded...