Question

How we can detect those target genes?

0

Entering edit mode

4.0 years ago

elvira.carmart • 0

Hi everyone, We're a biotechnology students group who is doing a genomics project. We are asked to carry out an experiment to find out the target genes of a certain transcription factor. To do this, we have designed a chip-seq experiment combined with one of rna-seq. However, we have been required to predict the target genes from the binding motif of our transcription factor that we know from Arabidopsis [(C / T) TGAC (T / C)]. Do you have any idea of which is the best procedure for getting those genes?

TF RNA-Seq ChIP-Seq • 990 views

ADD COMMENT • link updated 4.0 years ago by Alex Reynolds 35k • written 4.0 years ago by elvira.carmart • 0

score 2 · Answer 1 · 2020-04-19

If you have a de novo ("unpublished" or novel) TF position weight matrix (PWM) or MEME file describing the odds of finding a base at a position, then you can run that through a tool called TOMTOM to discover the nearest, most significant TF PWM/MEME result from a published database, such as the plant JASPAR database.

Once you have a known TF pattern, you can look upstream of target genes for sequences that are close to this pattern — say, -5kb of the gene's TSS (transcription start site).

This window would be defined as the target gene's promoter or proximal promoter, i.e., the part of the genome where a TF would bind, to control transcription of the gene immediately downstream.

You could get Arabidopsis gene annotations from a consortium, e.g.:

$ wget -qO- "https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff" | gff2bed - > Arabidopsis.genes.bed

To make promoters from these genes, you could do something like:

$ awk -v FS="\t" -v OFS="\t" -v padding=5000 '{ if ($6=="+") { ; $3 = $2; $2 = $2 - padding; } else if ($6 == "-") { $2 = $3; $3 = $2 + padding; } print $0; }' Arabidopsis.genes.bed | sort-bed - > Arabidopsis.promoters.bed

You can extract the sequence information for these BED regions using a tool like bed2faidx.pl and samtools-indexed FASTA files for the TAIR10 assembly (or whatever build of Arabidopsis that you're currently working with).

Given promoter regions in BED format converted to promoter sequences in FASTA format, you could use the instructions here to do a FIMO search of your known TF against these promoter sequences:

https://bioinformatics.stackexchange.com/questions/2467/where-to-download-jaspar-tfbs-motif-bed-file/2491#2491

Replace the hg38 chromosome FASTA files (and adjust other arguments, accordingly) with your promoter FASTA.

You can also skip the TOMTOM step if you have a PWM/MEME table for your TF of interest. Just build promoter sequences and then use FIMO.

score 1 · Answer 2 · 2020-04-20

I understand that you are a a students group so maybe this answer does not fully apply in your situation or at least is not what your instructor might have had in mind, but this is how one typically tackles ChIP-seq these days:

In order to quality-check a ChIP experiment you should contact your local sequencing facility and ask for a shallow sequencing run. That means they will take your library and spike-in it at any other run they perform at low concentration. This will give you like 1-2 mio reads per sample which is sufficient for QC control. This is imho far superior to qPCR as it 1) does not require a priori knowledge of any targets, 2) gives a global assessment of quality and 3) is more fesable if you have many samples. In our facility this typically costs not more than 200-300€ for a few samples. I strongly recommend this over qPCR. In the best case you will get a couple of regions from qPCR but have no idea about overall quality. Your strategy is also dangerous because a motif is not sufficient for TF binding. It is rather a gamble because if you are unlucky you do like 10 qPCRs and picked the 10 wrong regions, see my point 1) again. There are thousands of motifs in the genome where no factor binds as it is also a function of local chromatin structure and co-occurrence of other binding events that decides about TF binding. Do a shallow sequencing.