How we can detect those target genes?
3
0
Entering edit mode
4.0 years ago

Hi everyone, We're a biotechnology students group who is doing a genomics project. We are asked to carry out an experiment to find out the target genes of a certain transcription factor. To do this, we have designed a chip-seq experiment combined with one of rna-seq. However, we have been required to predict the target genes from the binding motif of our transcription factor that we know from Arabidopsis [(C / T) TGAC (T / C)]. Do you have any idea of which is the best procedure for getting those genes?

TF RNA-Seq ChIP-Seq • 990 views
ADD COMMENT
2
Entering edit mode
4.0 years ago

If you have a de novo ("unpublished" or novel) TF position weight matrix (PWM) or MEME file describing the odds of finding a base at a position, then you can run that through a tool called TOMTOM to discover the nearest, most significant TF PWM/MEME result from a published database, such as the plant JASPAR database.

Once you have a known TF pattern, you can look upstream of target genes for sequences that are close to this pattern — say, -5kb of the gene's TSS (transcription start site).

This window would be defined as the target gene's promoter or proximal promoter, i.e., the part of the genome where a TF would bind, to control transcription of the gene immediately downstream.

You could get Arabidopsis gene annotations from a consortium, e.g.:

$ wget -qO- "https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff" | gff2bed - > Arabidopsis.genes.bed

To make promoters from these genes, you could do something like:

$ awk -v FS="\t" -v OFS="\t" -v padding=5000 '{ if ($6=="+") { ; $3 = $2; $2 = $2 - padding; } else if ($6 == "-") { $2 = $3; $3 = $2 + padding; } print $0; }' Arabidopsis.genes.bed | sort-bed - > Arabidopsis.promoters.bed

You can extract the sequence information for these BED regions using a tool like bed2faidx.pl and samtools-indexed FASTA files for the TAIR10 assembly (or whatever build of Arabidopsis that you're currently working with).

Given promoter regions in BED format converted to promoter sequences in FASTA format, you could use the instructions here to do a FIMO search of your known TF against these promoter sequences:

https://bioinformatics.stackexchange.com/questions/2467/where-to-download-jaspar-tfbs-motif-bed-file/2491#2491

Replace the hg38 chromosome FASTA files (and adjust other arguments, accordingly) with your promoter FASTA.

You can also skip the TOMTOM step if you have a PWM/MEME table for your TF of interest. Just build promoter sequences and then use FIMO.

ADD COMMENT
0
Entering edit mode

Wow! it sounds really complicate for us, but we'll try. At first, we had thought about doing a Blast with the known target genes in A. thaliana against the genome of Citrus clementina (our organism of interest), does it make any sense? We only want those genes in order to validate the ChIP experiment before sequencing (doing a qPCR).

ADD REPLY
1
Entering edit mode

You could do a blast search to find homologs in Citrus, if all you are interested in is a quick validation.

ADD REPLY
1
Entering edit mode
4.0 years ago
ATpoint 81k

I understand that you are a a students group so maybe this answer does not fully apply in your situation or at least is not what your instructor might have had in mind, but this is how one typically tackles ChIP-seq these days:

In order to quality-check a ChIP experiment you should contact your local sequencing facility and ask for a shallow sequencing run. That means they will take your library and spike-in it at any other run they perform at low concentration. This will give you like 1-2 mio reads per sample which is sufficient for QC control. This is imho far superior to qPCR as it 1) does not require a priori knowledge of any targets, 2) gives a global assessment of quality and 3) is more fesable if you have many samples. In our facility this typically costs not more than 200-300€ for a few samples. I strongly recommend this over qPCR. In the best case you will get a couple of regions from qPCR but have no idea about overall quality. Your strategy is also dangerous because a motif is not sufficient for TF binding. It is rather a gamble because if you are unlucky you do like 10 qPCRs and picked the 10 wrong regions, see my point 1) again. There are thousands of motifs in the genome where no factor binds as it is also a function of local chromatin structure and co-occurrence of other binding events that decides about TF binding. Do a shallow sequencing.

ADD COMMENT
0
Entering edit mode

Thank you so much for your great explanation!

ADD REPLY

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6