Finding Regulatory Elements from Microarray Data

Ben-Zaken Zilberstein, C. and Geiger, D.
Department of Computer Science, Technion

e-mail:
chaya@cs.technion.ac.il
dang@cs.technion.ac.il

Microarrays yield enormous amounts of data about gene expression patterns in different cells of different species. The mechanism underlying a particular expression pattern is usually not yet known .Therefore, the development of tools that could reveal regulatory elements involved in an anomalous gene expression pattern is important in understanding the mechanisms underlying pathologies such as human cancer.Since our knowledge about the human genome is not yet complete, it is more convenient to develop these tools by investigating a well known system like the yeast genome, as we did herein. Regulatory elements involved in a certain expression pattern could be revealed by comparing the promoters of a co-regulated group of genes in order to find common sequence patterns differentiating this group from the others. This can be done by local multiple alignment of these promoters in order to detect common consensus sequences (Church et al, 2000) or by detection of over-represented oligonucleotide sequences (van Helden et al, 1998).Bothanalysis methods detect new candidate regulatory sites as well as sites that have already been characterized.We developed a computational tool to reliably predict short oligomers as regulatory elements by examining the promoters of a co-regulated group of genes.
Results We used microarray data containing the expression levels of all yeast genes 15 minutes after the introduction of rapamycin into the cell media, which closely simulate starvation of the cell (Hardwick et al ,1999).
We clustered together a group of genes known to be involved in the amino-acid metabolism that were also at least two fold over expressed 15 minutes after the rapamycin was introduced. We hypothesized that the genes in the over-expressed group share common regulatory elements and we searched this group for any such over-represented oligomers. This was accomplished by the comparison of two frequencies for each candidate oligomer. The first was the frequency of this oligomer in the promoters of the over-expressed genes and the second was its frequency in a control group of 1500 randomly chosen promoters .The higher the ratio between these two frequencies, the higher the chance that a certain oligomer has a biological role which may have caused the over-expression .In order to derive a cutoff for the ratio values to help assess their significance , we performed a simulation calculating this ratio for a randomly chosen group of the same size as that of the over-expressed genes. We repeated this simulation 20 times and used the highest ratio value obtained as the cutoff. When analyzing putative over-represented oligomers, only those exhibiting ratios much higher than the cutoff were selected as regulatory elements. We identified the GCN4 binding site with an extremely high frequency ratio of 10.00 whereas the cutoff was 5.00. GCN4 is a master gene involved in amino-acid metabolism and has already been reported as central in the over-expression of amino-acid metabolism genes during cell starvation (Marton MJ, 2001). All yeast genes that are well known to be regulated by GCN4 (e.g. HIS4, TRP4, ILV1, ILV2, ADE4, ARG1, ARG8, HIS3). were also over-expressed as expected. Another regulatory element we isolated was the CPF1 binding site. CPF1 is involved in the regulation of methionine synthesis. All genes that are regulated by this element (e.g. PGK1, CYT1, and MET16) were also over expressed as expected. We performed the same analysis on a group of genes that participate in the carbohydrate metabolism , and identified oligomers that exhibited ratios far above the cutoff (e.g. GGGGACT ,GGGGG, GGGGC and CGCGC). No literature was found regarding those oligomers. We therefore propose them as regulatory elements of the carbohydrate metabolism during cell starvation.
Discussion Our method has succeeded in identifying some important regulatory elements involved in the process of gene over-expression during cell starvation by analysis of data derived from a microarray .This enables us to throw light on the regulatory mechanism underlying the over-expression phenomenon by suggesting some of the elements involved. Our encouraging results are due to two aspects: First, we focus on genes that react quickly to a certain change (e.g. elevation of rapamycin in the cell media) because this suggests they are responding to a relatively specific common cause. This common cause is likely to co-occur with a small over-represented group of regulatory elements suggesting that our method is effective in these cases. Second, we use statistical simulation to derive a cutoff value, reducing the number of false-positives to below 5\%, making our method very reliable.

References

[1] Hardwick J.S., Kuruvilla F.G., Tong J.K., Shamji A.F. and Schreiber S.L.
Rapamycin-modulated transcription defines the subset of nutrient-sensitive signaling pathways directly controlled by the Tor proteins. Proc. Natl. Acad. Sci USA, 1999 Dec 21;96.

[2] Hughes J.D., Estep P.W., Tavazoie S., Church G.M .Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J.Mol.Biol.2000, Mar 10;296(5):1205-14.

[3] Natarajan K., Meyer M.R., Jackson B.M., Slade D., Roberts C. Transcriptional profiling shows that Gcn4p is a master regulator of gene expression during amino acid starvation in yeas. Hinnebusch A.G., Marton M.J. J. Mol. Cell Biolology 2001, 21(13):4347-68.

[4] van Helden J., Andre B., Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J.Mol. Biology 1998, 281(5):827-42.