Question: is there any online tool to know which transcription factors ( TFs) bind to the regulatory regions of a list of genes in zebrafish?
0
gravatar for rajasekargutha
8 months ago by
India
rajasekargutha20 wrote:

I have a list of 1900 genes of zebrafish. i would like to find out the which TFs bind to my list of genes of zebrafish. please send links

blast rna-seq next-gen gene genome • 594 views
ADD COMMENTlink modified 8 months ago by Alex Reynolds24k • written 8 months ago by rajasekargutha20
1
gravatar for genomax
8 months ago by
genomax51k
United States
genomax51k wrote:

Try Contra v3 server in "exploration" mode.

ADD COMMENTlink written 8 months ago by genomax51k

I have t search for every single gene, so it is not possible for my 1900 genes list

ADD REPLYlink written 8 months ago by rajasekargutha20
1
gravatar for Alex Reynolds
8 months ago by
Alex Reynolds24k
Seattle, WA USA
Alex Reynolds24k wrote:

I am assuming you have tools like wget and grep etc. and that you have installed BEDOPS. I am also assuming you are working with the zebrafish build danRer7, and that you have samtools-indexed FASTA files for this genome downloaded locally.

To build a BED file of intervals for your genes:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/danRer7/database/refGene.txt.gz | gunzip -c > danRer7.refGene.txt
$ grep -f genes.txt danRer7.refGene.txt | awk -v OFS="\t" '{ print $3,$5,$6,$13,$9,$4 }' | sort-bed - > danRer7.genes.bed

To generate TSS regions, separate by strand and get the start position of the gene:

$ awk -v OFS="\t" '($6=="+"){ print $1, $2, ($2+1), $4, $5, $6; }' danRer7.genes.bed > danRer7.genes.start.for.bed
$ awk -v OFS="\t" '($6=="-"){ print $1, $3, ($3+1), $4, $5, $6; }' danRer7.genes.bed > danRer7.genes.start.rev.bed

Then apply a 5kb asymmetric padding around the start positions (you can pick a different window to define the regulatory region upstream of the TSS):

$ bedops --range -5000:0 --everything danRer7.genes.start.for.bed > danRer7.genes.start.for.TSS5k.bed
$ bedops --range 0:5000 --everything danRer7.genes.start.rev.bed > danRer7.genes.start.rev.TSS5k.bed

Take the union of the two files:

$ bedops --everything danRer7.genes.start.for.TSS5k.bed danRer7.genes.start.rev.TSS5k.bed > danRer7.genes.TSS5k.bed

Separate out the padded TSSs per gene:

$ awk '{ print $0 >> "danRer7.TSS5k."$4".bed"; }' danRer7.genes.TSS5k.bed

Generate FASTA for each gene using the bed2faidx script. Here's how you would do this for one gene, say "LOC567192":

$ bed2faidx --fastaDir=/path/to/danRer7/indexed/fasta < danRer7.TSS5k.LOC567192.bed > danRer7.TSS5k.LOC567192.fa

You have ~1900 genes in genes.txt, so it is probably easier to use a loop:

$ while IFS= read -r gene; do echo "$gene"; bed2faidx --fastaDir=/path/to/danRer7/indexed/fasta < "danRer7.TSS5k.${gene}.bed" > "danRer7.TSS5k.${gene}.fa"; done < genes.txt

Once you have FASTA for each gene's padded TSSs, you can run MEME on the sequence information to get motif calls. You would then run TOMTOM to match what MEME discovers against entries in databases of published TF motifs.

The web interface for MEME and TOMTOM will only let you do one set of regions at a time. I would suggest you take one gene's sequence information and run it through these two tools to see what you get back, to get familiar with what these tools do.

Since you need to automate this, my advice is, once you know how MEME and TOMTOM work, you then download the source code and databases and build command-line versions of these tools that you can run locally. Then you know exactly what parameters go into your search. This is going beyond the scope of the answer here, but you can search Biostars and there should be answers that have already covered this topic.

ADD COMMENTlink written 8 months ago by Alex Reynolds24k

Nice comprehensive post!

ADD REPLYlink written 4 months ago by Seigfried40
0
gravatar for Alex Reynolds
8 months ago by
Alex Reynolds24k
Seattle, WA USA
Alex Reynolds24k wrote:

1) Use BEDOPS bedops to generate a BED file of regulatory regions of genes (like a 5kb region upstream of TSS, say).

2) Convert BED to FASTA with samtools and bed2faidx for your indexed reference genome.

3) Run the FASTA through MEME to find putative motifs.

4) Run the motif PWMs through TOMTOM to find highest ranking matches with known/published TF motifs (e.g., matches with JASPAR, TRANSFAC, UniPROT, etc.).

Search engines on the relevant terms will get you links to the tools mentioned here.

ADD COMMENTlink written 8 months ago by Alex Reynolds24k

Could you send some tutorial to get bed file of regulatory genes. Can this method give regulatory sequence bed file for my list of genes at a time. Do I need to work out for each gene at a time. Please, guide me how to find regulatory region of gene with bedops because i am not familiar to this.

ADD REPLYlink written 8 months ago by rajasekargutha20

What do you have for genes? Do you just have gene names, or annotations with intervals?

ADD REPLYlink written 8 months ago by Alex Reynolds24k

I have only Gene names only

ADD REPLYlink written 8 months ago by rajasekargutha20

Can you post your genes in a file somewhere public?

ADD REPLYlink written 8 months ago by Alex Reynolds24k

I will post. Where do you want me to post.

ADD REPLYlink written 8 months ago by rajasekargutha20

Not sure, up to you. Dropbox, Google Drive, whatever you have.

ADD REPLYlink written 8 months ago by Alex Reynolds24k

I don't have your email id to share through Dropbox

ADD REPLYlink written 8 months ago by rajasekargutha20

Upload to pastebin.

ADD REPLYlink written 8 months ago by genomax51k

I have uploaded in pastebin

ADD REPLYlink written 8 months ago by rajasekargutha20

You need to post the link for your list on pastebin here. Otherwise how can @Alex find the data.

ADD REPLYlink written 8 months ago by genomax51k

https://pastebin.com/rN7vw8n7 I have uploaded in this link

ADD REPLYlink written 8 months ago by rajasekargutha20

You don’t need my email address to put the items into a Dropbox Public folder and share them.

ADD REPLYlink written 8 months ago by Alex Reynolds24k
0
gravatar for ZhouK.R.
8 months ago by
ZhouK.R.0
Sun Yat-sen University, China
ZhouK.R.0 wrote:

You can use “Download” module of ChIPBase to get all genes regulated by transcription factors or histone modification marks in Zebrafish, and then filter the results with you gene list.

ADD COMMENTlink modified 8 months ago • written 8 months ago by ZhouK.R.0

link you have provided is not opening

ADD REPLYlink written 8 months ago by rajasekargutha20

The website is temporarily blocked by the firewall because of the secure policy. You can send your list of genes to my mailbox and I will do this analysis.

ADD REPLYlink written 8 months ago by ZhouK.R.0

Could you send your email id

ADD REPLYlink written 8 months ago by rajasekargutha20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour