Entering edit mode
6.6 years ago
seta
★
1.9k
Hi all,
I have a gene list containing about 5000 genes (gene name and the related Entrez Gene ID), I would like to extract the chromosome number, genomic coordinates, feature type (promoter, gene,transcript,exon,CDS,UTR,start_codon,stop_codon) and genomic strand for this gene list. Could you please help me out on this issue? please kindly share me any tool or command.
Thanks
Try biomart or apis from major repositories
Except the promoter region, all these information are available from a GTF file for your species. Given you are working with human, download the current GTF e.g. from GENCODE and
grep/zgrep
for the respective gene names. From there on, you can further subset for the features you want. Here are information about the GTF format.Example for the gene CEBPA (subset):
For the promoter region, I am not sure if there are actually databases. For matters of simplicity, using the 250bp upstream of the first exon sounds like a reasonable approach to me.
Thanks for your response. Could you please tell me if
zgrep
get the gene list? Regarding promoter region, I think about 1000 bp upstream of transcription start site (TSS) is OK, any suggestions?I do not understand what you mean by 'get the gene list'. Please explain. 1kb is quiet big, too big for my taste. A typical ATAC-seq peak (open chromatin) at a promoter is typically like 500bp. If you only go for the nucleosome-free region, it is about 200bp. Depends on your goal, but if you plan to check for motif enrichment, better go for a smaller than for a larger region.
I have a gene list containing about 5000 gene name with Entrez Gene ID. My mean is: if it is possible to get the required information for this gene list with
zgrep
, simultaneously, instead of typing just one gene name as you show in the example? Thank you for your point about promoter region, I would like to examine the variants in this region.Yes
grep
(or it's companionzgrep
which searches gzipped files) can search for multiple patterns, using the-E
parameter. Please spend some quality time on learning the basics of the Unix command line, especiallyawk
andgrep
. You'll find plenty of tutorials on the web. I understand that you probably would prefer a ready-to-use script, but depending on what you want to do with the output, you'll need further commands for subsequent filtering. Therefore it is very advisable to first get a background in the Unix tools. You'll need that all the time :-)