Question: Common gene names list to BED with +/- TSS intervals
0
gravatar for rbronste
16 months ago by
rbronste300
rbronste300 wrote:

Just trying for simplest way to take a set of common gene names and generate a bed interval file of +/- 2kb of each gene TSS? Thanks.

rna-seq • 528 views
ADD COMMENTlink modified 16 months ago by Alex Reynolds29k • written 16 months ago by rbronste300
1

Are answers here not extendable to current question: Table browser +/- 2Kb of TSS export You had asked this question back then.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax75k

Yes its helpful but I was looking for best conversion method of common gene names to RefSeq or UCSC etc outside of Table Browser which appears to be lacking in this respect.

ADD REPLYlink written 16 months ago by rbronste300

conversion method of common gene names to RefSeq or UCSC

That is a different question then. I would suggest taking a look at this file.

ADD REPLYlink written 16 months ago by genomax75k
2
gravatar for Alex Reynolds
16 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

If you have file of mouse gene symbols called genes.txt, here's one way you might get mouse gene 2kb proximal promoters for mm10 or GRCm38:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M18/gencode.vM18.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "gene"' - \
    | convert2bed -i gff - \
    | awk -vwindow=2000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), $2, $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, $3, ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    > gencode.vM18.promoters.bed

Then to filter them against the list of genes:

$ grep -w -F -i -f genes.txt gencode.vM18.promoters.bed > gencode.vM18.promoters.filtered.bed
ADD COMMENTlink written 16 months ago by Alex Reynolds29k

Very late reply, but this gets only upstream of the TSS by 2kb, wondering how I can get a window of 2kb on either side? Thanks!

ADD REPLYlink written 4 days ago by rbronste300
1

Modify both start and pesudo-stop?

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M18/gencode.vM18.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "gene"' - \
    | convert2bed -i gff - \
    | awk -vwindow=2000 -vOFS="\t" '($6=="+"){ print $1, ($2 - window), ($2 + window), $4, ".", $6, $7, $8, $9, $10 }($6=="-"){ print $1, ($3 - window), ($3 + window), $4, ".", $6, $7, $8, $9, $10 }' \
    | awk -vOFS="\t" '{ if ($2 < 0) { $2 = 0; } print $0; }' \
    > gencode.vM18.promoters.bed

I added a test, to adjust the start coordinate to zero if it is less than zero. You might instead filter these elements out, if you require all elements to be 4kb windows centered on their TSSs.

ADD REPLYlink written 4 days ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2058 users visited in the last hour