Structural Genomics:
From Gene to Structure to Function
Robinson College, Cambridge, 20-22 September 2000
Molecular Graphics and Modelling Society
www.mgms.org  UK Registered Charity #287750


Speakers Abstracts

N.B. This will be updated as abstracts are received.


Structural Genomics: A New Role of Structural Biology for Functional Genomics

Sung-Hou Kim

Department of Chemistry and Lawrence Berkeley National Laboratory

University of California, Berkeley CA, 94720, U.S.A.

Analysis of several genomic sequences indicates that no known functions can be inferred to a significant fraction of the genes. To infer functions for the products of these genes additional information, beyond sequences, is needed. Since the molecular (biochemical and biophysical) function of a gene product is tightly coupled to its three-dimensional structure, finding the structure or its folding pattern may provide an important insight into the molecular function of the gene product. That, in turn, may help in understanding its cellular function (genetic and physiological function: networks of many molecular functions) as well. We have started testing the premise that the structure infers molecular function of a protein with unknown function. Using the gene products of a hyperthermophile, Methanococcus jannaschii we have tested the premise. The results of the test will be reviewed for three "hypothetical" proteins, where neither their functions nor the structures are known, and one protein for which its cellular function was inferred but molecular functions is not known.


Completeness in Structural Genomics

John Moult

Center for Advanced Research in Biotechnology
9600 Gudelsky Dr.
Rockville, MD 20850 USA

[Abstract Not Yet Available]

Protein families of unknown structure in Pfam

Alex Bateman & Erik Sonnhammer

Center for Genomics Research, Karolinska Institutet, S-171 77 Stockholm, Sweden

The Pfam database contains a comprehensive collection of protein domain families defined by sequence homology. All Pfam families are linked to the PDB database by direct sequence comparison. It is thus relatively simple to use Pfam as a resource for creating a list of families of unknown 3D structure (1). I will discuss some of the issues with creating such a list, such as completeness, the differences one might expect from domain family definitions in Pfam compared to structure-based definitions, and how to annotate the list for prioritizing targets.

(1) Elofsson A, Sonnhammer EL (1999) A comparison of sequence and structure protein domain families as a basis for structural genomics. Bioinformatics, 15, 480-500.


The Berlin-based "Protein Structure Factory" Project

U. Heinemann1,2, K.P. Hofmann3, G. Illing4, C. Lang5, C. Maurer6, H. Oschkinat7,2, W. Sanger2 and M. Schroedter8

1Forschungsgruppe Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Str. 10, D-13125 Berlin, Germany; 2Institut für Chemie, Freie Universität, Takustr. 6, D-14195 Berlin, Germany; 3Institut für Medizinische Physik und Biophysik, Klinikum Charité der Humboldt-Universität, Ziegelstr. 5-9, D-10098 Berlin, Germany; 4BMBF-Leitprojekt Proteinstrukturfabrik, Heubnerweg 6, D-14059 Berlin, Germany; 5Fachgebiet Mikrobiologie und Genetik, Technische Universität, Gustav-Meyer-Allee 25, D-13335 Berlin, Germany; 6Ressourcenzentrum im DHGP, Heubnerweg 6, D-14059 Berlin, Germany; 7Forschungsinstitut für Molekulare Pharmakologie, Alfred-Kowalke-Str. 4, D-10315 Berlin, Germany; 8Alpha Bioverfahrenstechnik GmbH, Im Biotechnologiepark, D-14943 Luckenwalde, Germany

Structural genomics "aims at the determination of the 3D structure of all proteins" (1). This international initiative follows as a logical consequence from the various genomic sequencing projects and can be seen as a subspecialty of the emerging science of functional genomics. The basic idea behind structural genomics is to determine by X-ray crystallography or NMR spectroscopy protein structures representing all protein families present in the biosphere and thereby allowing the homology modelling of virtually every protein structure. In order to finish this project within reasonable time, methods for high-throughput structure analysis have to be developed. It is hoped that the availability in the near future of a comprehensive set of representative protein structures will have an important impact on biology and greatly accelerate rational drug development.

The Berlin based "Protein Structure Factory" (PSF) contributes to the international structural genomics initiative. The PSF (2) is distinguished from other projects in this field by

References and Footnotes

1. http://www.structuralgenomics.org/main.html

2. U. Heinemann, J. Frevert, K.-P. Hofmann, G. Illing, H. Oschkinat, W. Saenger and R. Zettl in Genomics and Proteomics (S. Suhai, ed.) Kluwer Academic / Plenum Publishers, New York, pp. 179-189 (2000).


High Throughput Expression for Structural Analysis:- Pitfalls & Prospects

Owen Jenkins

Current methods to express and purify proteins for crystallography are routinely performed in relatively low throughput mode and on a protein by protein basis. The absolute goal of producing crystallographic grade material on a limited number of idiosyncratic target proteins allows for time-consuming, reiterative approaches to expression:- ie re-working of constructs, change of expression systems, novel purification and refolding protocols until ultimately, successful crystallisation is achieved. To transform this process into a truly high throughput mode, for example 1000 proteins per year requires a complete rethinking of the philosophy and methodology of expression. This presentation will attempt to address the issues and feasibility of truly high throughput expression and purification using current technologies and those which may be available in the near future.


Functional genomics using 3-D structures: from ORFANs to unknowns

Chantal Abergel, Vincent Monchois, Christian Cambillau*, J.-M. Claverie

Information Structurale et Génétiques et * Architecture et Fonction des Macromolécules Biologiques (URL: http://igs-server.cnrs-mrs.fr; * http://afmb.cnrs-mrs.fr)

31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, FRANCE

Newly sequenced microbial genomes routinely reveal up to 50% of genes without significant similarity to previously characterized genes, and thus without any functional attribute. Our hope is that an approach combining large-scale bioinformatics and 3-D structure determinations can shed some light on the function (and pharmaceutical relevance) of unknown "anonymous" genes. An overview of our projects and methods, as well as preliminary results on a pilot study of E. coli ORFAN genes will be presented.

Abergel C., Bouveret E., Claverie J.-M., Brown K., Rigal A., Lazdunski C. and Benedetti H. (1999) Structure of the Escherichia coli TolB protein determined by MAD at 1.95 Angstroms resolution. Structure 7: 1291-1300.

Alimi J.-Ph., Poirot O., Lopez F., and Claverie J.-M. (2000) Reverse Transcriptase-Polymerase Chain Reaction Validation of 25 "Orphan" Genes from Escherichia coli K-12 MG1655 Genome Res. 10: 959-966

Cambillau C. and Claverie J.-M. (2000) Structural and Genomic Correlates of Hyperthermostability. J. Biol. Chem. (in press)

Ogata H., Audic S., Barbe V., Artiguenave F., Fournier P.-E., Raoult D. and Claverie J.-M. (2000) Selfish DNA in Protein Coding Genes of Rickettsia. Science (in press)


Protein modules - targets for structural genomics and beyond

Iain D Campbell

Department of Biochemistry, University of Oxford, South Parks Rd,

Oxford, OX1 3QU, UK

A protein module can be defined as a domain with a contiguous sequence that appears repeatedly in diverse proteins. Expanding module databases, identified by multiple sequence alignments and other data, are arising from sequencing projects2,3,4. It is probable that about 4000 families will match components of nearly all the proteins in the various genomes. Protein modules thus appear to be ideal targets for a co-ordinated program in structural genomics. We have been determining module structures, using NMR, for a number of years2 and current progress will be described. Since many module structures are already known, one can also consider how knowledge about module structure might be exploited in functional genomics. Illustrations of how this can be done, will be given from on-going studies of proteins from the extracellular matrix and connective tissue4-8. Features where NMR has advantages, such as studies of ligand binding, module assembly and module dynamics will be emphasised.

  1. Baron, M, Norman DG and Campbell ID (1991) Protein modules TIBS 16, 13-17.
  2. Campbell ID (1998) Modular architecture of cell-surface receptors Immunol. Rev 163 11-18.
  3. http://smart.embl-heidelberg.de/ ; http://www.sanger.ac.uk/Software/Pfam/
  4. Campbell ID & Downing AK (1998) NMR of modular proteins. Nature Struct. Biol. 5, 496-499
  5. Bocquier AA et al. (1999) Solution structure of a pair of modules from the gelatin-binding domain of fibronectin Structure 7, 1451-1460
  6. Penkett CJ et al. (2000) Identification of residues involved in the interaction of S. aureus fibronectin-binding protein with the 4F15F1 module pair of human fibronectin, using heteronuclear NMR Biochemistry 39 2887-2893
  7. Smith SP et al. (2000) Interface characteristation of the type II module pair from fibronectin. Biochemistry 39, 8374-8381.
  8. Wilkins MB et al. (2000) Drosophila dumpy is a gigantic extracellular protein required to maintain tension at epidermal cuticle attachment sites Current Biol. 10, 559-567

From Xray maps to protein function

Tom Oldfield

Molecular Simulation Inc.
University of York, UK

Recent developments in recombinant DNA techniques, crystallisation protocols, X-ray data collection techniques and devices, and computing have led to a substantial increase in the speed and number of protein structure determinations in modern crystallographic laboratories. However, there still remains a number of key stages in the crystallographic process which limit the rate of structure determination. One of these is fitting electron density maps, either in the initial stages of tracing a chain to a new map, or in the manual rebuilding during refinement. It is also apparent that solving a protein structure is not enough. It is necessary that an automated structure and functional classification is required to complete the process of protein structure determination. The electron density applications already available within QUANTA represent novel and effective tools for speeding up all aspects of map interpretation. The various modules ( X-AUTOFIT, X-LIGAND, X-SOLVATE, X-BUILD and X-POWERFIT) have been developed in close collaboration with the large number of crystallographers working on projects in the Protein Group at York. These tools provide the crystallographer with automated CA-tracing, automated sequence assignment, automated model building, automated validation, automated ligand fitting, automated water fitting, structure classification and functional classification all in a single program.


High Throughput X-ray Crystallography for Drug Discovery

Harren Jhoti

Astex Technology, UK

Astex is developing proprietary discovery platforms that will perform high throughput X-ray crystallography (HTX) to image target proteins at an unprecedented rate. A key component of HTX is a powerful new Internet-based software technology called AutoSolve which is able to determine crystal structures of protein/ligand complexes in a rapid and automated manner. Examples will be provided which highlight the performance of AutoSolve.


Underlying methodology for high-throughput structure determination

Victor S. Lamzin(1) and Anastassis Perrakis(2)

1. European Molecular Biology Laboratory (EMBL), Hamburg Outstation, c/o DESY, Notkestrasse 85, 22603 Hamburg, Germany
2.European Molecular Biology Laboratory (EMBL), Grenoble Outstation, c/o ILL, B.P. 156, 6 Rue Jules Horowitz, 38043 Grenoble CEDEX 9, France

The vast majority of the macromolecular three-dimensional structures are nowadays determined by X-ray crystallography and it is foreseen that this technique will play a key role and will be further explored for the needs of structural genomics projects. Currently, rapid structure production is impeded by the time requirements to carry out a crystallographic experiment that may take anything from hours to years. Developments in crystallographic methodology and availability of tools that would allow determination of the macromolecular structures in a real high-throughput manner have now become one of the central goals.
There is an acute need for the re-examination of the whole process of X-ray structure determination. The data collection, phasing, model building, refinement and validation are much more tied together than was generally believed to be, and should be considered as a single entity. The responsibility for construction of reliable macromolecular models will naturally shift from investigators to the developers of the underlying methodology.
The vast majority of the data will be recorded at synchrotron sources. The provision of on-site computational facilities directly linked with data collection will be the first step towards automation. Advances in molecular biology and availability of tuneable radiation which enabled the success of the MAD/SAD technology, as well as the rapidly growing database of macromolecular structures are essentially re-defining the concept of the crystallographic phase problem and the emphasis should now be moved to obtaining higher quality X-ray data and faster and more reliable structure determination.
One of the major challenges, which still remains, and a major bottleneck for high-throughput X-ray structure determination is the inspection of electron density maps, the construction of macromolecular models and their refinement. The ARP/wARP suite is being developed to address these problems and may already be in a position to promote progress in automating the steps of deriving a complete structural model. Given the X-ray data extending to a resolution of 2.3 Å or higher, the time required for building a protein structure can be shortened to a few CPU hours on inexpensive workstations.


Structures, Function, Weak Interactions, and NMR

Hartmut Oschkinat

NMR-supported Structural Biology, Forschungsinstitut fuer Molekulare Pharmakologie, Alfred-Kowalke-Str. 4, 10315 Berlin, Germany

Protein domains make up the 'structural code' of life that has now gained special attention in structural genomics. However, it is difficult to read it in terms of protein function, because individual protein folds may be used for a variety of different biological tasks. In this context, an attempt to judge activities of signalling domains is given with the examples of WW, PDZ and EVH1 domains, based on a combination of NMR and peptide library experiments. A strong component of structural genomics is the development of high-throughput technology for structure determination. Attempts to automate the NMR process within the Berlin project are outlined.


Structure-based Functional Genomics

Gaetano T. Montelione, Stephen Anderson, Daphne Palacios, Bonnie Dixon, Kristin Gunsalus, Yuanpeng Huang, Hunter Moseley, Daniel Monleon, Rajan Paranji, Parag Sahasrabudhe, G. V. T. Swapna, Roberto Tejero, Rong Xiao, and Deyou Zheng

Center for Advanced Biotechnology Medicine, Rutgers University, Piscataway, NJ 08854

Genome sequencing projects have already determined nearly complete genome sequences of several organisms, including human. The products of these genes are widely recognized as the next generation of therapeutics and targets for the development of pharmaceuticals. While identification of these genes is proceeding quickly, elucidation of their three-dimensional (3D) structures and biochemical functions lags far behind. In some cases, knowledge of 3D structures of proteins can provide important insights into evolutionary relationships that are not easily recognized by sequence alignment comparisons. Thus, structure determination by NMR or X-ray crystallography can sometimes provide key information regarding protein fold class, locations and clustering of conserved residues, and surface electrostatic field distributions that connect a protein sequence with potential biochemical functions. The resulting limited set of putative biochemical functions can then be tested by appropriate biochemical assays. We are developing technologies that will significantly accelerate the process of protein structure determination by X-ray crystallography and NMR. These include bioinformatics methods for parsing novel genes into domain encoding regions, high-level "multiplexed" protein expression systems, database structures for keeping track of reagents and project data, and NMR pulse sequences, data collection methods, and expert-system software for automated analysis of protein resonance assignments and 3D structures. The goal of this work is to develop a "high-throughput" process for structural analysis of novel gene products on a genomic scale and to apply this in the analysis of novel gene products identified in the genome sequencing projects.

Montelione, G. T.; Anderson, S. Nature Struct. Biol. 1999, 6: 11 - 12. Structural Genomics: Keystone for a human proteome project.

Moseley, H. N. B.; Montelione, G. T. Curr. Opin. Struct. Biol. 1999, 9: 635 - 641. Automated analysis of NMR assignments and structures for proteins.


RIKEN Structural Genomics Projects

Shigeyuki Yokoyama

Genomic Sciences Center, RIKEN Yokohama Institute
1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan.

The RIKEN Institute has started the Structural Genomics Initiative is to establish the relationship between structures and functions of proteins encoded by prokaryote and eukaryote genomes. The RIKEN Structural Genomics Initiative includes the Structurome Project (Leader, S. Kuramitsu), which is a structural genom project to determine crystal structures of as many proteins as possible from an extremely thermophilic eubacterium, Thermus thermophilus HB8, at RIKEN Harima Institute at SPring-8. The Protein Folds Project at Genomic Sciences Center (GSC), RIKEN Yokohama Institute is to analyze structures and functions of mouse/human and plant proteins expressed from the full-length cDNAs collected and sequenced by other groups of GSC. For this purpose, six 800-MHx and ten 600-MHz instruments have been installed. In addition to two RIKEN beam lines at SPring-8, construction of new high-throughput beam lines is planned. The cell-free protein synthesis i s the major method for protein sample preparation with stable-isotope labeling for NMR and selenomethionine substitution for crystallography. A bioinformatics group headed by Y. Matsuo is also involved in the target selection and systematic analyses of the protein structures and functions.


Structural Proteomics in Prokaryote Systems

Aled M. Edwards(1,2), Akil Dharamsi(2), Masoud Vedadi (2), Dinesh Christendat (1), Adelinda Yee (1), Cheryl Arrowsmith (1,2)

1. Department of Medical Biophysics, Ontario Cancer Institute, University of Toronto, Canada
2. Integrative Proteomics, Suite 520, 100 College St, Toronto, Ontario, Canada

Understanding the biology of an organism will require a range of genomics and proteomics approaches. Structural proteomics is an important part of the general strategy. Our academic and commercial arms have been combining structural proteomics with proteome-wide protein-protein interaction studies and data mining approaches to try to understand the biology of an archaeon and a bacterial pathogen. Highlights from these projects will be discussed.


The European Macromolecular Structure Database (EMSD) and Structural Genomics

Henrick, K., Ionides, J., Keller, P. Irwin, J., Velankar, S. and Barton, G. J.

EMBL-European Bioinformatics Institute, Genome Campus, Hinxton, Cambs CB10 1SD, U.K.
Tel: +44 1223 494414 Fax: +44 1223 494496
Email: geoff@ebi.ac.uk WWW: http://barton.ebi.ac.uk

Approximately 250 new structures per month are deposited to the PDB (Protein Data Bank) collection, of these, 20% are deposited to and processed by the EMSD. Projects in structural genomics promise dramatically to increase the number of new structures deposited, many of which will be for proteins of unknown function. If these data are to be useful both to structural biologists and to the wider biological community, then it is essential that the data are saved in the public archives in a complete and accurate form and that the data are organised to allow complex questions to be answered with minimal effort. In this talk, I outline the work in progress at EBI that will allow fast and accurate deposition, sophisticated searches, and detailed cross-referencing with other EBI databases such as TrEMBL/SWISS-PROT and EnsEMBL genome annotation.


Structural genomics in the context of other genome research

Richard Durbin

Sanger Centre, Hinxton, UK

Structural genomics projects projects are getting going now around ten years after the large scale sequencing programmes started, at a time when many other systematic functional genomics approaches are also being started. I will review some of the history of the development of genomic sequencing that might be relevant to large scale structure data collection, and consider the context of other genomic-scale efforts, exploring how they might influence the practice and prioritorisation of structural genomics. Finally, the primary output of all these large scale projects is data that must be managed in databases and accessed computationally. I will discuss how I see the requirements for future informatics resources developing to make these data available to research biologists in a maximally useful form.


Analysis of gene expression; bridging the gap from sequence to function

Tom Freeman

Sanger Centre, Hinxton, UK

The potential to correlate the genetic makeup of an organism to its biological function is moving into a new era. This is primarily being driven by the acquisition of the sequence of all the genes by the large-scale cDNA and whole genome sequencing programmes. However even now, with approximately ninety percent of sequencing of the human genome completed, our knowledge of the transcriptome is still in its infancy. There remain great discrepancies in the estimates of the total number of mammalian genes, and the expression pattern of most genes and the function of the proteins they encode, is largely unknown.

Knowledge of where and when a gene is expressed can provide valuable insights into the function of the encoded protein. If the expression of a gene can be shown to be restricted to a given tissue or cell type, then the protein?s function is highly likely to contribute to the specific physiology of that system. Knowing a gene?s expression pattern and comparison to that of others, also allows for the association of the function of one gene with that of another, as genes involved in the same pathway or protein complex, often exhibit highly similar expression profiles. Finally, expression profiling can now provide unparalleled insights into molecular mechanisms regulating biological systems. As the regulation of transcription is one of the primary controls of biochemical function, monitoring of the transcriptome during a change in the functional status of the system can, without any prior knowledge or hypothesis, reveal which genes may be regulating or underlying these changes. I will outline some of the approaches to the analysis of gene expression we have been using and discuss their utility in revealing new insights into gene function and the biology of complex systems.


Further Developments Towards Reliable Genome-scale Fold Recognition

David T. Jones (1) & Caroline Hadley (2)

1. Institute for Cancer Genetics and Pharmacogenomics, Department of Biological Sciences, Brunel University, Uxbridge, Middlesex, U.K.
2. Department of Biological Sciences, University of Warwick, Coventry, U.K.

Protein fold recognition by threading has proven to be a very effective means for predicting protein tertiary structure from sequence, as witnessed by the number of successful threading predictions made in the various CASP prediction experiments (e.g. Jones, 1999a). Despite this success, the better fold recognition methods still often employ some degree of human expert intervention, which is clearly impractical if these methods are going to be applied to the annotation of uncharacterised genome sequences. We have already described a method for identifying distant homologues to known 3-D structures (Jones, 1999b) using a combination of traditional sequence profile alignments, a set of potentials similar to those used for full optimal sequence threading, and a neural network based expert system. This very quick approach to fold recognition, whilst not being capable of recognizing analogous fold relationships, is very successful in reliably recognizing homologous fold similarities. Recently we have been exploring further developments of this method to extend its range both towards more distant evolutionary relationships (using "structure-function fingerprints') and also towards analogous fold relationships using a new version of our well-established threading program (THREADER 3) and a recently developed post-processing step, again involving neural networks. Preliminary results from both these new approaches will be discussed.

Jones, D.T., Tress, M., Bryson, K. & Hadley, C. (1999) Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins. S3, 104-111.

Jones, D.T. (1999) GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797-815.


Exploiting Protein Structure in Genome Annotation

Michael Sternberg (1), Patrick Aloy (1,2), Francesc Xavier Aviles (2), Paul Bates (1), Lawrence Kelley (1), Robert MacCallum (1), Arne Mueller (1), Enrique Querol (2) & Mansoor Saqi (1).

(1) - Biomolecular Modelling Laboratory, Imperial Cancer Research Fund 44 Lincoln's Inn Fields, London WC2A 3PX, England

(2) Institut de Biologia Fonamental and Departament de Bioquimica Universitat Autonoma de Barcelona, Bellaterra 08193. Barcelona, Spain

strategy to annotate the structure and function of protein coding regions in genomes will be described. We have completed an initial characterisation using standard programs such as PSIBLAST. Our plans are to use a method for fold recognition (3D-PSSM, Kelley et al, J.Mol.Biol. 2000, 299 501-522 ) to identify remote homologies. Three-dimensional models for proteins will be constructed using our program 3D-JIGSAW (Bates & Sternberg, Proteins, 1999, suppl . 3, 47-54.). Both these programs can be used via web servers (www.bmm.icnet.uk). A strategy will be outlined to facilitate the interpretation of protein function from structure. To begin to include a high level view of protein function into structure-based genome annotation, we have analysed the relationship between the conformation of a proteins and its assignment to metabolic pathways


The Evolution and Structural Anatomy of the Small Molecule Metabolic Pathways in Escherichia coli

Sarah A. Teichmann(1), Stuart C.G. Rison(2), Janet M. Thornton (1,2), Monica Riley(3) & Cyrus Chothia(4)

1. Department of Biochemistry and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
2. Department of Crystallography, Birkbeck College, Malet Street, London WC1E 7HX
3. Josephine Bay Paul Centre for Comparative Molecular Biology and Evolution, 7 MBL St., Woods Hole, MA 02543-1015, USA
4. MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB

For the first time, sufficient sequence, structure and functional data is available for a thorough examination of all the small molecule metabolic pathways of an organism in terms of the protein families of the enzymes. With information on the domain structure and evolutionary relationships of over three-quarters of the gene products in Escherichia coli metabolic pathways, we can determine the extent to which domains are duplicated within and across pathways and are combined to form multi-domain enzymes. We have examined which funtional features are conserved in families of homologues and thus shed light on the evolution of pathways and enzymes.


What can structure tell us about bioinformatics?

Pete Artymiuk

University of Sheffield

Bioinformatics will be of immense value in guiding the formulation of strategy for structural genomics initiatives and for the attribution of putative functions to the structures of proteins of unknown function. However our present bionformatics tools are far from perfect and must also be refined in the light of new structures.


Evolution of function in protein superfamilies, from a structural perspective: implications for genome annotation

Annabel E. Todd (1), Christine A. Orengo (1) and Janet M. Thornton (1,2)

1. Department of Biochemistry and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
2. Department of Crystallography, Birkbeck College, Malet Street, London, WC1E 7HX

The recent growth in protein sequence and structural databases has revealed the functional diversity of many protein superfamilies. An understanding of how such diversity has evolved through sequence and structural changes is essential for the accurate functional annotation of the large number of uncharacterised gene products identified in genome sequencing projects. Given the large number of genes in the human genome, but a comparatively small number of folds, extensive combination, mixing and modulation of existing folds has occurred during evolution to generate the multitude of functions necessary to sustain life. With the first working draft of the human genome complete, and the sequencing of other multi-cellular organisms underway, a grasp of these evolutionary processes is required if we are to benefit from this wealth of data.

We have analysed how functional changes are implemented by modulation of sequence and structure with reference to 31 diverse enzyme superfamilies, and thus provide an overview of the mechanisms by which functional diversity has evolved. This has involved extensive reading of the literature combined with analyses of our own. Functional variation occurs mostly in more distantly related proteins (<40%) and the structural data have been essential for understanding the molecular basis of observed functional differences. A large number of variations and peculiarities are observed, at the atomic level through to gross structural rearrangements. Using selected examples, we present the structural and functional attributes which are conserved within some superfamilies and those that differ, and what bearing, if any, these similarities and changes have on protein function. The implications these observations have on structural genomics projects will be discussed.


New developments concerning the Swiss-PdbViewer sequence to structure workbench

Nicolas Guex, Torsten Schwede, Alexander Diemand and Manuel Peitsch

GlaxoWellcome Experimental Research S.A.
16, chemin des Aulx, CH-1228 Plan-les-Ouates, Geneva, Switzerland

Initially Swiss-PdbViewer (SPDBV; http://www.expasy.ch/spdbv/) was developed as a protein viewer, running only on the Macintosh platform. Over time, it evolved toward a cross-platform program with the same functionality and interface on MacOS, Windows, IRIX and Linux.
As the program provides an interface to SWISS-MODEL as well as a large set of features (from basic display and measurement tools to computation of molecular surfaces, electrostatic potential, force field energy, 3D structure superposition, rotamer scanning, loop building, etc) it has been widely adopted for teaching and routine work. However, one main limitation was the absence of scripting language. Thus, so far it was not possible to automate tasks with SPDBV.
Four options have been considered to overcome this:

The fourth was retained, and a complete interpreted language inspired from C and perl syntax was developed using flex and yacc. The language supports variables, arrays, conditional branching, loops, access to external files, and to some extent subroutines. Several key features of the program are already accessible via scripting, in a "natural language" way, and more will be added in the future. We think that this option will allow more people to contribute than option 1 or 2 (as no intimate knowledge of internal data structures or how to compile and link large projects is required). It should also be more appealing to developers than option 3, as it is more powerful and permits to add commands and functions to the program. We hope to promote script sharing through publication in the SPDBV mailing list in the first place, and by maintaining a web database of scripts with description of their function and proper author credits if there is a growing interest.


Structural Genomics: Building a Structural Foundation for Biology

Jean-Denis Pedelacq

Los Alamos National Laboratory and The Consortium for Structural Genomics* Los Alamos, NM 87545

The high-throughput determination and analysis of protein structures across whole genomes is one of the most exciting challenges in life science. The genome projects are changing biology by providing the opportunity to improve our understanding of cells, in particular, and of life, in general. Los Alamos is part of an effort to plan and promote the field of structural genomics. Participants in the 14-institution Consortium have carried out a pilot structural genomics project based on proteins from the hyperthermophile Pyrobaculum aerophilum, and are beginning a larger project to determine and analyze structures of functionally important proteins from Mycobacterium tuberculosis. The lessons learned in this pilot project will be discussed.

*The Consortium for Structural Genomics: Thomas Alber, James Berger, University of California, Berkeley; Edward N. Baker, University of Auckland; Joel Berendzen, Min Park, Tom Terwilliger, Geoffrey Waldo, Los Alamos National Laboratory; James Bowie, David Eisenberg, Juli Feigon, Jeanne Perry, Todd Yeates, UCLA; Axel Brunger, Paul Adams, Lawrence Berkeley National Laboratory; William Jacobs, Albert Einstein College of Medicine; Bernhard Rupp, Lawrence Livermore National Laboratory; James Sacchettini, Texas A&M University; Se Won Suh, Seoul National University; Manfred Weiss, Institute of Molecular Biology, Jena; Matthias Wilmans, Paul Tucker, Emke Pohl, EMBL-Hamburg; William Wood, University of Colorado, Boulder; Shigeyuki Yokoyama, RIKEN.


Crystallization for Structural Genomics: What We Have and What is Missing

Naomi E. Chayen

Biological structure and Function Section, Division of Biomedical Sciences, Imperial College School of Medicine, London SW7 2AZ, UK

The subject of protein crystallization has gained a new strategic relevance in the next phase of the genome project in which X-ray crystallography will play a major role. The ability to express, purify and crystallise large numbers of proteins will determine the success of structural genomics yet, even in cases where expression and purification are well under way, one often gets stuck at the stage of attempting to produce high quality crystals. Automation is crucial for crystallization (as well as for the other phases of structural genomics) since screening of numerous potential conditions is the first step in the search for crystals. Major effort and resources are currently being invested (largely in the USA) into automatic generation of high throughput crystallization trials. However, in spite of the ability to generate numerous trials and the manpower involved, so far only a small percentage of the proteins produced have led to structure determinations.

Some proteins will surely crystallise during the initial screening but many others are likely to yield microcrystals or low-ordered crystals. This is not really surprising because the conversion of such crystals into useful ones requires intellectual input and individualised optimisation techniques.

Dispensing crystallization trials automatically, especially for screening, is no longer a major problem. However there are a number of issues which still need a lot of attention. For example: the large amount of manual preparation required prior to the actual dispensing, the issue of cleaning hundreds of syringes, the viewing, follow-up and analysis of the results. These stages can be and will be automated but the most important part - the optimisation of the crystallization conditions for difficult cases is more difficult to automate. It is only good methodology that has in the past solved difficult intractable crystallization problems, yet the issue of improving crystallization methods has been somewhat neglected in the rush to automate everything.

This presentation will describe simple optimisation methods which have resulted in successful crystallization of proteins that could not be crystallised otherwise. These techniques have not yet been adapted as high throughput techniques, but they have the potential to become so. In combination with automated screening, the development of crystal optimisation methods will equip the genome project to deal with its awesome task.