Structural Genomics:
From Gene to Structure to Function
Robinson College, Cambridge, 20-22 September 2000
Molecular Graphics and Modelling Society
www.mgms.org  UK Registered Charity #287750


Poster Abstracts

The organisers apologise for errors introduced into the abstracts
following conversion from Word to HTML.


Determining function from structure: application to a Pyrobaculum aerophilum protein

Determining function from structure: application to a Pyrobaculum aerophilum protein

Jean-Denis Pédelacq*, Elaine C. Liong*, Beom-Seop Rho*, Chang Y. Kim*, Kevin C. Menes*, Trevin Zyla*, Lisa Cornelius*, Min S. Park*, Sorel Fitz-Gibbon§, Jeffrey H. Miller§, Joel Berendzen*, and Tom Terwilliger*

*Los Alamos National Laboratory, Los Alamos, NM 87545

§UCLA, Los Angeles, CA 90095

With the increasing number of genomes sequenced, one of the most challenging hurdles is to decipher the information encoded by these protein sequences. Sequence homology searches do not always provide all of the answers, as some proteins may not have retained sequence homology throughout evolution. On the contrary, the function of a protein is closely linked to its three-dimensional structure. Therefore, structural determination of the libraries of protein structures is a matter of high priority to overcome the obstacles involved. Los Alamos National Laboratory is part of a multi-laboratory effort to plan and promote the field of structural genomics. Along with researchers at UCLA, UC-Berkeley, Lawrence Livermore National Laboratory, Pacific Northwest National Laboratory, Lawrence Berkeley National Laboratory, Caltech, and the University of Auckland we have carried out a structural genomics pilot project on the hyperthermophile, Pyrobaculum aerophilum.

Over the past two years, 130 proteins from Pyrobaculum aerophilum have been produced by members of the consortium. We have recently purified and crystallized one of these proteins for which no known function was available. The crystal of this 243 amino acid residue protein belongs to the orthorhombic space group C2 with cell parameters a = 161.8 Å, b = 48.6 Å, c = 60.1 Å and two molecules in the asymmetric unit. A MAD experiment was performed on a single crystal at 100 K using the NSLS beamline X8C (Brookhaven, New York). Complete highly redundant data were collected to a resolution of 2.1 Å. All selenium sites were found using the program SOLVE and the initial electron density map was of good quality. The experimental phases were improved by subsequent cycles of solvent flattening in the new RESOLVE program. Determining the function of this unknown protein was conducted using structural homology searches.

This work has been supported by the Structural Biology Resource Program at LANL and the Campus Laboratory Collaboration program of the University of California.

This abstract will be presented as a talk.

Directed structural genomics

James H. Naismith

Centre for Biomolecular Sciences, BMS Laboratories, The North Haugh,

The University, St. Andrews, Fife Scotland, U.K., KY16 9ST

Tel:: +44 1334-463792, Fax: +44 1334-467229

www: http://speedy.st-and.ac.uk/

Mycobacteria survive by assembling a complex coat. This coat is linked to the peptidoglycan by sugar molecules. We are determining the enzymes involved in coat assembly and synthesis. Thus far we have determined five new enzyme structure by MAD techniques, shedding light on the chemistry of this process. The genes for capsule, cell wall and LPS biosynthesis are grouped together in bacterial chromosome, presenting an attractive target for a focused program in structural genomics. These proteins are all potential therapeutic targets and a structural understanding will aid research in this area in addition to assisting annotation of function.


Structure-Based Functional Classification of Proteins for Structural Genomics

Michael A. Kennedy, John R. Cort, Aled M. Edwards, Cheryl H. Arrowsmith

Macromolecular Structure & Dynamics, Environmental Molecular Science Laboratory, Pacific Northwest National Laboratory, EMSL 2569 K8-98, Richland, WA 99352

phone 509-372-2168, fax 509-376-2303

email ma_kennedy@pnl.gov

As part of a pilot study in Structural Genomics, we have used nuclear magnetic resonance spectroscopy (NMR) to determine the solution-state structure of several "hypothetical" or "uncharacterized" proteins from various organisms. In each case, functional annotation of the protein prior to structural characterization was not possible from sequence analysis or comparison to sequences with known structure and function. We show several examples where knowledge of the protein structure immediately narrowed the related fold classes to one or two possibilities. We find that analyses of the functions represented for related fold classes frequently leads to hypotheses about the function of each protein that are easily testable using NMR methods. In one case, a MTH538 from Methanobacterium thermoautotrophicum (M. therm), a singleton in sequence space, was found to have a fold similar to flavodoxin and CheY. NMR mapping experiments indicated that MTH538 lacked flavin binding activity, ruling out a function related to a flavodoxin, however it was found to have weak Mg2+ binding affinity. Collectively, sequence and structural analyses together with NMR mapping results indicated that MTH538 might represent a phosphorylation-independent receiver domain in a two-component signal transduction system. In another example, the structure of MTH1175 from M. therm was determined and its fold was found be similar to only one existing fold class according to CATH and SCOP, the RNaseH family. However, the fold of MTH1175 differs sufficiently from the RNaseH family that it may be designated as a novel fold. Unlike MTH538, MTH1175 is a member of a conserved family of proteins primarily of archaeal origin (COG1433), so the analysis of the structure and function of the protein can be carried out in the context of, and related to, the other members of the protein family from other organisms. In a third example, YciH from E. coli was found to represent a new protein superfamily similar to the ferredoxin-like topology (CATH) or DcoH-like fold (SCOP). Consequently, YciH now falls under a new fold classification in SCOP, the eIF1-like fold. Since YciH falls into a very common fold class, represented by more than 20 related superfamilies in SCOP and CATH, the YciH example illustrates how functional classification can be limited in highly populated fold spaces. However, it offers an opportunity to explore ancestral relationships between related folds including 1) convergent evolution, resulting in proteins with common structures but unrelated amino acid sequences or 2) divergent evolution, resulting in expanded function for proteins originating from a common protein ancestor. In many cases, the putative active site in proteins occurs in highly dynamic regions that are well suited for characterization by NMR methods and that might not be as easily characterized by X-ray crystallographic methods. Because NMR provides a capability for rapid NMR mapping of ligand binding sites together with the ability to characterize dynamic regions of proteins, NMR should be considered a critical technology for post-Structural Genomics studies. The potential role that NMR can play in bridging the gap between Structural and Functional Genomics and its relationship to Proteomics will be discussed.


Structural Genomics at Structural GenomiX

Tom Peat

Structural GenomiX,

10505 Roselle Street, San Diego CA 92121, USA

Tel: 00 1 858 558 4850, Fax: 001 858 558 4859

Email: tom@stromix.com

Structural GenomiX (SGX), a San Diego-based start-up company, was founded to capitalize on the value of protein structure information in drug and compound discovery. The company is developing a high-throughput platform to support the determination of hundreds of novel protein structures via x-ray crystallography. This platform integrates advances in genomics, x-ray crystallography and bioinformatics. The company’s approach to structure determination is genomic: families of target genes from a range of organisms are input into the platform, raising the odds of success and generating valuable information about family relationships. The company utilizes leading x-ray crystallography techniques to ensure fast structure solution, including the routine incorporation of selenium into proteins, the use of a third-generation synchrotron (the Advanced Photon Source in Illinois), and the use of MAD phasing. Bioinformatics tools enable target selection based on real-time information, and annotation that adds value for customers engaged in drug and compound discovery.

SGX plans to make its protein structures available to pharmaceutical, biotechnology, agricultural and industrial customers through subscriptions to an annotated database and in strategic alliances. The speed and quantity of structures generated by the SGX process will enable customers to access protein structure earlier in the compound discovery process, changing target selection and bringing structure-based drug design into more universal applications.

The SGX process of target selection to structure determination is presented along with an overview of the current process as compared to what is being constructed for higher throughput in the near future. Recent progress towards scale up and automation is presented along with results in terms of structures completed to date.

This abstract will be presented as a talk.

A Structural Genomics Pilot Project Based on the Genome of Escherichia coli

M. Cygler, A. Matte, Y. Li, J. Schrag, J. Sivaraman, C. Smith, V. Sauvé,

R. Larocque and S. Raymond.

Biotechnology Research Institute, National Research Council of Canada, 6100 Royalmount Ave., Montréal, Quebéc, Canada H4P 2R2 and

Montréal Joint Centre for Structural Biology, Montréal, Quebéc, Canada

Several structural genomics pilot projects are now underway world-wide, with the eventual promise of high-throughput protein structure determination. An essential task within such projects is the development of effective methods to express, purify and crystallize large numbers of proteins efficiently. We have initiated a pilot scale project based on gene targets selected from the genome of E. coli. Thirty-six genes have been selected for cloning initially and most of these were successfully over-expressed as soluble proteins. Target genes have been cloned as N-terminal fusions with either glutathione-S-transferase or (His)6 tags. Initial affinity purification is achieved using glutathione Sepharose, or Ni-NTA resins, followed by thrombin cleavage to remove the fusion tag. Further purification using conventional FPLC ion exchange or gel filtration chromatography is then performed. Using this approach, over 20 proteins have so far been purified to apparent homogeneity as assessed by SDS-PAGE. Purified proteins are further characterized for homogeneity and suitable solution properties using a combination of dynamic light scattering, electrophoretic methods, mass spectrometry and limited proteolysis. Purified protein samples are screened for initial crystallization conditions using a sparse-matrix approach. To follow the progress of various genes and to store all relevant experimental data required development of specialized database to store these data. Web-based software for displaying and searching the information from this local database and for cross-referencing with other genomics databases has been developed.

To date, some crystals have been obtained for more than half of the purified proteins. Of these, diffraction quality crystals were obtained for five proteins. Using SeMet substituted proteins we have at this time determined the structures of three of these proteins. Another group has reported the structure of the fourth protein in the meantime. We will present the statistics related to various steps of the process and summarize the current results.

This abstract will be presented as a talk.

GeneAtlasTM - An Automatic High-throughput Pipeline for Structure Prediction and Function Assignment for Genomic Sequences

Lisa Yan, Zhan-Yang Zhu, Azat Badredinov, David Kitson, Krzysztof Olszewski, David Edwards

Molecular Simulations, Inc.,

9685 Scranton Road, San Diego, CA 92121, USA

Email: lly@msi.com

With the vast amount of protein sequences determined from the genomic sequencing project, there is an emerging need to use high throughput method to predict structures and assign functions of the protein sequences. GeneAtlas is an automated, high throughput pipeline for the prediction of protein structure and function using sequence similarity detection, homology modeling, and fold recognition methods. It uses PSI-BLAST and SeqFold to search for homologous structures from PDB database and MODELER to build 3D models for the sequences based on the template structure. The quality of the 3D-model is validated using Profile-3D/verify score. The accepted model gives the correct fold and indicates the possible function of the genome sequence from the known template. Furthermore, protein 3D structure contains much richer information for its function than sequence along. Functional annotations based on the 3D-model using a suite of methods give further details of the protein function which are crucial for target discovery, protein engineering, and inhibitor design. Using a "virtual" genome, a subset of PDB structures from SCOP database which consists of protein structures of less than 40% sequence identity, as a benchmark, we demonstrate that GeneAtlas detects additional functional relationships by building 3D-models for genomic sequences in comparison with the widely used sequence searching method, PSI-BLAST. The method was applied to 22 publicly available genomes, including C. elegans, D. melanogaster, S. cerevisiae, H. sapiens, A. thaliana, etc. The modeling results of a small genome, M. genitalium, and the comparison with PSI-BLAST and Hidden Markov Model (HMMer/pfam) on function assignment will be discussed.


Torward the crystal structure of a lectin-like natural killer cell activator receptor bound to its MHC class I ligand.

Susana Cristóbal , Ylva Lindqvist and Gunter Schneider

Molecular Structural Biology Group. Medical Biochemistry and Biophysics.

Karolinska Institute. Tomtevodavägen 6 171 77 Stockholm. Sweden

Natural killer (NK) cell function is regulated by NK receptors that interact with MHC class I molecules on target cells. The murine NK receptor Ly-49D activates NK cells activity by interacting with H-2Dd through its C-type-lectin-like NK receptor domain. The activating Ly-49D receptor and the inhibitory Ly-49A receptor mediate opposing effects on NK cell cytotoxicity. The €missing self€ hypothesis predicts that all NK cells express at least one inhibitory receptor for a self MHC I antigen, allowing NK cells to delete cells with lost or altered self MHC expression. Thus, the very existance of NK activating receptors specific for MHC class I remains somewhat perplexing, and the exact physiological role of these receptors has not been clarified. We are trying to co-crystallizate activating receptor and ligand to elucidate those questions by providing basis for an analysis of the interaction in the activating function. We have already overexpressed and purified MHC class I (H-2Dd) and we are trying to obtain recombinant Ly-49D in a soluble form using a novel overexpression approach. In prokaryotes, the recently characterized TAT pathway can transport folded substrates across the inner membrane, and signal peptides specific for this pathway bears a twin arginine motif. I have demostrated that a folded large cytoplasmic domain of a non-TAT protein can be translocated by this machinery by fusion to a TAT signal sequence (Cristóbal et als, 1999) and we are using this strategy to overexpress and improve the solubility of the Ly-49D receptor.


Prediction of the 3D structure for proteins with tandem repeat sequences.

Andrey V. Kajava

Center for Molecular Modeling

CIT, National Institutes of Health,

Bldg 12A, 12 South Drive MSC 5626, Bethesda, MD 20892, USA

The genome sequencing projects have revealed a considerable number of protein sequences with tandem arrays of 10-30 residue long repeats. Despite the established functional importance of many such proteins, only a few of their 3D structures are known. The lack of the structural information is explained by the fact that large molecular weight and elongated shape of these molecules hamper X-ray and NMR studies. On the other hand, inspection of the known structures suggests that structural prediction of such proteins can be more reliable than prediction of aperiodic globular proteins. The prediction can be facilitated by assuming repetitive spatial arrangements within the tandem repeat sequence and by more reliable distinguishing of structurally important residue positions in the repeats. Prediction and modeling of several proteins with 10 to 30-residue long repeats, such as leucine-rich repeat proteins, human involucrin, and bacterial surface-associated adhesins are described. The modeled structures incorporate constraints from electron-microscopy, circular dichroism and other indirect structural experiments. The approach used for prediction and modeling of these proteins can be applied to other proteins containing internal repeats and will lead to a valuable tool of structural bioinformatics.


Solution structure of the tyrosyl-tRNA synthetase C-terminal domain: a novel type of anticodon binding module.

A. Pintar [1], A. Prochnicka-Chalufour [1], V. Guez [2], C. Castagne [1],

H. Bedouelle [2], and M. Delepierre [1].

[1] Laboratoire de RMN (CNRS URA 2185)

[2] Unite de Biochimie Cellulaire, Institut Pasteur,

28 rue du Dr. Roux, 75724 Paris Cedex 15, France.

Tyrosyl-tRNA synthetase (TyrRS) is a homodimeric protein that catalyzes both the activation of the amino acid through its reaction with ATP and the transfer of the aminoacyl-adenylate to the tRNA(tyr). In Bacillus stearothermophilus, each subunit of TyrRS comprises two structural domains, an N-terminal domain (residues 1-319), whose crystal structure is known, and a C-terminal domain (residues 320-419) which appears disorded in the crystals. The binding site of one tRNA-Tyr molecule encompasses both subunits of the TyrRS dimer. The folding state of the C-terminal fragment was characterized in solution by biophysical techniques and compared with those of full-length TyrRS. We present here the 3-dimensional structure of a recombinant protein TyrRS(de4) corresponding to the C-terminal domain of TyrRS solved by heteronuclear NMR spectroscopy. We suggest that the disorder observed in the crystal structure is due to a flexible linker between the N- and C-terminal regions. The TyrRS(de4) structure exhibits a novel fold among the anticodon binding domains of aminoacyl-tRNA synthetases, around two thirds of which appear to be shared with the ribosomal protein S4 and a heat shock protein HSP15. The common topology involves two _ helices packed against an antiparallel _ sheet. Of six basic residues identified by site directed mutagenesis as essential for tRNA binding, four are clustered in this domain and are likely to interact with the anticodon arm of tRNA.


The Defining Characteristics of Immunoglobulin-like Proteins

A. E. Kister and I.M. Gelfand

Department of Mathematics, Rutgers University,

110 Frelinghuysen Rd. Piscataway, NJ, USA

Phone: (732) 445 34 78; Fax: (732) 445 55 30

akister@math.rutgers.edu

The main goal of this work is the analysis of the general relation between sequences of immunoglobulins and their three dimensional structure, i.e. analysis of sequence features that are consistent with a structure. Our recent investigations showed that distantly related proteins (with no significant homology) that share one type of immunoglobulin fold share as well a small set of residues at the same positions. This set of residues constitutes the defining characteristics for the immunoglobulin fold. Residues at each key position are chemically related and play approximately the same structural role in all proteins (residue-residue contacts, surface exposure and other features, across all proteins, as well as almost identical coordinates in the system of coordinates unified for the protein family or fold. The result of the energy calculations in the threading test showed that these residues have the decisive role in structure stability and they are fully sufficient for Ig fold recognition.

Knowledge of the distinguishing characteristics allows one to compare sequences with a low similarity (less than 20%) and, hence, assign these sequences to a proper protein fold by using several key residues only. In fact, it is not necessary to know all or almost all residues in a sequence as required for other traditional tools such as BLAST, FASTA, and HMM. Based on this analysis a new method of protein classification was developed. The basic idea behind this method is that residues at key positions are taken into account only, all other residues are out of considerations. We will present the results of the classification using the defining characteristics for the different superfamilies of the immunoglobulin folds.


Accurate formula for P-values of gapped local sequence and profile alignments

Richard Mott

University of Oxford Wellcome Trust Centre for Human Genetics

Roosevelt Drive, Oxford OX3 7BN, UK

We present a simple general approximation for the distribution of gapped local alignment scores, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (i.e. gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to fit an extreme-value distribution to simulations or to the results of databank searches. The method is based on the theoretical ideas introduced by R. Mott and R. Tribe in 1999 J Comp. Biol 6:91-112; Mott, 2000, J. Mol Biol. 300:649-659 . Extensive simulation studies show that score-thresholds produced by the method are accurate to within +/-5 % 95 % of the time. Further details available from http://www.well.ox.ac.uk/~rmott/ariadne.html


NMR Structure Determination and Ligand Screening of Hypothetical Proteins from Haemophilus Influenzae

Lisa Parsons, Nicklas Bonander, Edward Eisenstein, and John Orban

Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, Maryland 20850 U.S.A.

As the number of genomes sequenced continues to increase, a recurring observation is that approximately 20-40% of the open reading frames in these genomes have no known function (e.g. . These ‘hypothetical’ proteins have no sequence homology with proteins of known function and typically have homologues in other organisms from bacteria to eukaryotes. Consequently, any information that can be obtained on these proteins will be important in understanding their role in a wide range of biological systems and will fill a large gap in knowing what is required for the viability of a free-living organism. The goal of this project is to obtain three-dimensional structures for soluble proteins in this hypothetical category and to use this structural information to narrow down potential biochemical functions which can then be assayed by other methods . Examples from our current structural work on a number of HI proteins will be discussed together with results from small molecule ligand screening using NMR methods.

Supported by NIH P01 GM57890.

References

Fleischmann, R. D. et al. (1995). Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496-512.


Docking Large Proteins Using Spherical Polar Fourier Correlations

Russell S. Hamilton [1], David W. Ritchie [1,2], Graham J.L. Kemp [1]

[1] Department of Computing Science, University of Aberdeen, King's College,

Aberdeen, AB24 3UE

[2] Department of Molecular and Cell Biology, University of Aberdeen, Polwarth Building, Foresterhill, Aberdeen, AB25 2ZD

If we are to relate gene and protein structure to biological function, then it is necessary to develop good computational models of how large biomolecules might interact. There exist several algorithms for predicting how small ligands "dock" to proteins. However, these methods often require prior knowledge of the ligand binding site. In larger protein-protein complexes (where the interfacial surface area may range from about 800 to 1600 A^2), efficient and accurate surface matching algorithms are required to identify feasible docking orientations. The most successful current macromolecular docking algorithms are usually based either on (a) geometric hashing [1], or (b) fast Fourier transform (FFT) techniques [2]. However, none of the existing methods is well-suited to docking very large complexes such as the antibody Fab - haemagglutinin complex [4]. The geometric hashing algorithms are prone to a combinatorial increase in the number of features that need to be compared as the size of the molecules increase: none of the submissions to CASP2 used this method. The best single solution submitted used an FFT approach [3], but the haemagglutinin moiety had to be broken into several fragments and large (low resolution) grids had to be used for the problem to fit into main memory.

To address many of the limitations of the grid-based FFT approaches, we recently developed a new Fourier-like algorithm based on spherical polar Fourier correlations [5,6]. By itself, our spherical polar approach is also unsuitable for such problems because our radial functions fall off rapidly beyond about 30 A from the chosen origin, hence molecular shapes larger than this are represented poorly. However, it is not necessary to rely on a single origin. We have developed an automatic method of generating multiple coordinate "centres", each of which may be used to capture an accurate representation of a local surface region. We can then perform high resolution angular docking searches over each surface patch. Taken together, these angular searches correspond to a full rigid-body search over the entire molecular surface. Selecting a suitable set of projection centres must be done with care to ensure full coverage of the molecular surface whilst minimising the amount of computation that must be performed. This new macromolecular docking algorithm is fully automated and good docking predictions can now be obtained for very large complexes.

[1] D. Fischer and S. L. Lin and H. L. Wolfson and R. Nussinov (1995) A Geometry Based Suite of Molecular Docking Processes, JMB 248 459-477.

[2] E. Katchalski-Katzir, I. Shariv, M. Eisenstein, A.A. Friesem and C. Aflalo (1992) Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques, Proc. Natl. Acad. Sci. 89 2195-2199.

[3] I. Vasker (1997) Evalation of GRAMM low-resolution docking methodology on the Hemagglutinin-Antibody Complex, Proteins: Struct. Funct. Genet. Suppl. 1, 226-230.

[4] J. S. Dixon (1997) Evaluation of the CASP2 Docking Section, Proteins: Struct. Funct. Genet. Suppl. 1, 198-204.

[5] D. W. Ritchie and G. J. L. Kemp (2000) Protein docking using spherical polar Fourier correlations, Proteins: Struct. Funct. Genet. 39, 178-194.

[6] Hex download page: "http://www.biochem.abdn.ac.uk/hex/".


The BALSAMIC (Basic Active and Ligand binding SurfAce Matching through Image Comparison) project.

Steven J. Pickering(1), Andrew J. Bulpitt(2), Nick D. Efford(2),

Nicola D. Gold(1) and David R. Westhead(1).

1. School of Biochemistry and Molecular Biology, and 2. School of Computer Studies, University of Leeds, Leeds, W. Yorks. LS2 9JT.

e-mail: westhead@bmb.leeds.ac.uk

The relationship between biochemical function and protein structure is not straightforward. Sometimes function can be deduced directly from sequence and/or fold, but there are many examples of proteins with similar sequences and different functions, and examples of protein folds that support many unrelated functions. Even when proteins have related functions, small differences, for instance in enzyme specificity, are often only apparent when detailed structural information is available. Structural genomics initiatives are expected to generate structures for large numbers of proteins, the majority of which will be uncharacterised from a functional point of view. The aim of these initiatives is to provide structural information as a route to the determination of gene function. This will require the development of new bioinformatic tools able to predict functional characteristics from protein structure.

Many important biochemical processes, for example molecular interaction and recognition or catalysis, occur on or near protein surfaces. Surfaces with similar shape, chemical and physical properties are likely to perform similar functions. If a protein has an unknown function, a route to function prediction is therefore to locate similar surfaces in other proteins of better characterised function, and transfer the information. This process operates in direct analogy to the practice of predicting function from sequence by seeking to find related sequences of known function, for instance in a BLAST search.

Surface matching is a difficult computational problem, much studied in the field of computer vision. We report early progress in adapting methods from this field to the problem of protein surface matching. A multi-resolution approach views the surface in terms of differential properties, the curvedness and shape index, describing the degree of curvature and nature of the local surface shape (convex, concave, saddle) respectively. The surface matching algorithm uses a tree-structured search space where algorithmic efficiency is achieved through early pruning of branches corresponding to matches judged to be unreasonable by the above criteria. Preliminary results indicate that the method will be both efficient and useful. Future work will adapt the algorithms to include physico-chemical properties of the surface, with the aims of producing more reasonable matches from a biochemical point of view, and further increases in efficiency by increasing pruning of the search tree.

A database of surfaces for comparison will be generated and the algorithms will be used for similarity searches of this database. Other search algorithms will be implemented, including searches for user defined spatial arrangements of key functional residues. Ultimately these services will be made available on the World Wide Web, and for analysis of structural genomics data.


Ablation of cyclins in Xenopus oocytes by antisense oligonucleotides

selected by hybridisation to scanning arrays

M. Sohail, H. Hochegger, A. Klotbucher, R. Guellec, T. Hunt and

E. M. Southern

University of Oxford, Department of Biochemistry,

South Parks Road, OX1 3QU, UK

Arrays of antisense oligonucleotides corresponding to the first 120 nucleotides each of the cyclins B1, B4 and B5 were fabricated on the surface of aminated polypropylene. The arrays were hybridised with the appropriate radiolabelled transcript to assess the ability of the immobilised oligonucleotides to form heteroduplexes with their targets. Oligonucleotides that produced strong heteroduplex yield, as well as those that showed little annealing, were assayed for their effect on translation of endogenous cyclin mRNAs in Xenopus egg extracts and their ability to promote cleavage of cyclin mRNAs in oocytes by RNase H. Excellent correlation was found between the antisense potency and the affinities of the oligonucleotides for cyclin transcripts as measured by the arrays, despite the complexity of the cellular environment.


Homology modelling of hyperthermophilic phosphoglycerate kinases

Gina Crowhurst and Jennifer Littlechild

Schools of Chemistry and Biological Sciences, University of Exeter, Stocker Rd., EX4 4QD, UK

The hyperthermophilic archaeon Sulfolobus solfataricus lives at temperatures of up to 87ºC whilst Pyrococcus woesei survives temperatures in excess of 100ºC. The methods utilised in stabilising intracellular proteins have been studied using homology modelling techniques using phosphoglycerate kinase (PGK) as an example. The glycolytic enzyme PGK is a well studied enzyme with over 100 primary sequences currently available. The enzyme is normally a monomer, however S. solfataricus PGK is tetrameric whilst P. woesei PGK is dimeric. Homology models of the S. solfataricus, P. woesei, Methanococcus bryantii and Haloarcula vallismortis PGKs have been generated and compared to the existing X-ray structures of the enzyme from Bacillus stearothermophilus, Thermotoga maritima and Saccharomyces cerevisiae. These models have provided insights into the mechanisms of stabilisation employed by hyperthermophiles and the substrate and cofactor binding sites of archaeal PGKs. The modelled S. solfataricus PGK structure also reveals potential areas for hydrophobic subunit interaction. Examination of the archaeal PGK protein sequences has confirmed that the essential catalytic residues have been conserved. A possible gene duplication event evident in the S. solfataricus PGK has also been observed.


(beta/alpha)8-barrel enzymes and the evolution of function & pathways.

Richard R. Copley

EMBL, Meyerhofstrasse 1, 69012 Heidelberg, Germany

We provide statistically reliable sequence evidence indicating that at least 12 of 23 scop (beta/alpha)8 (TIM) barrel superfamilies share a common origin. This includes all but one of the known and predicted TIM barrels found in central metabolism. The statistical evidence is complemented by an examination of the details of molecular function and protein structure, with certain structural locations favouring catalytic residues even though the nature of their molecular function may change. The combined analysis of sequence, structure and function also enables us to propose a phylogeny of TIM barrels. Based on these data, we are able to examine differing theories of pathway and enzyme evolution, by mapping known TIM barrel folds to the pathways of central metabolism. The results favour widespread recruitment of enzymes between pathways, rather than a 'backwards evolution' model, and support the idea that modern proteins may have arisen from common ancestors that bound key metabolites.

This abstract will be presented as a talk.

No fold recognition method is always best! Results from studies of different fold recognition methods.

Arne Elofsson

Stockholm Bioinformatics Center, Stockholm University, 106 91 Stockholm

Here we report results from two recent studies of different fold recognition methods. In the first study we have performed the first large (10000 pairs) test of alignment quality using several different alignment methods (local, global, profile alignment, hmmer, sam.t98, clustalW, sspsi) (Elofsson, 2000 submitted). We show that both evolutionary information and predicted secondary structure information improves the alignment quality. The best alignments are obtained from a method that combines a sequence profile obtained from psiblast with predicted secondary structures. In the second study. we present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers (Bujnicki et al 2000 submitted). Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM and INBGU. The assessment was carried out using as prediction targets a large number of selected protein structures released during October 1999 to April 2000. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: "easy" and "hard". We found that all servers were able to find the correct answer for the vast majority of the easy targets when a structurally similar fold was present in the server's fold libraries. However, among the hard targets - where standard methods such as PSI-BLAST fail - we found that the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which having a significantly accurate sequence-structure alignment. Unfortunately, the increased sensitivity of the fold-recognition servers over standard methods came with the cost of low specificity.

Probably the most interesting observation from these studies is that there is not a single method that always produce the best results (fold recognition or alignment). For instance we show that almost twice as many good models can be created using any method compared with the best method for fold related pairs and that each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one method in difficult prediction tasks. And it also implies that it would be possible to improve fold recognition performance significantly if a combination of several methods could be done without loosing specificity.

In conclusion, we would like to encourage all protein structure predictors to take advantage of the variety of methods available. In both these studies we have used novel methods to measure the quality of a model generated from a fold recognition method. We will also discuss the advantages using these novel methods for measuring fold recognition capacity (Siew et al 2000, in press; Cristobal et al 2000, manuscript in preparation).

This abstract will be presented as a talk.

beta-Glucosyltranferase: Substrate Binding and metal site.

S. Moréra(a) , L. Larivièrea, W. Rüger(b), P. Freemont(c),

a.LEBS,UPR 9063 CNRS. Bât.34 , 91198-Gif-sur-Yvette, France

b.Arbeitsgruppe Molekulare Genetik, Ruhr Universität, Bochum, Germany

c.MSFL, ICRF, 44 Lincoln’s Inn Field, London WC2A 3PX, UK

beta-Glucosyltransferase (BGT) is a DNA-modifying enzyme encoded by bacteriophage T4 which catalyses the transfer of glucose from uridine diphosphoglucose (UDPG) to 5-hydroxymethylcytosine (HMC) in double-stranded DNA. The glucosylation of T-4 phage DNA is part of a phage DNA protection system aimed at host nucleases. We previously reported the complete BGT co-crystal structure in the presence of UDPG1 where the glucose is missing due to BGT cleavage. This BGT structure has provided us with a basis for detailed modelling of DNA bound to BGT. Furthermore, using the structural similarity between the catalytic core of glycogen phosphorylase and BGT, we have been able to model the position of the missing glucose moiety from UDPG.

We now report two BGT-UDP-Mg2+ structures from crystals grown in the same conditions except the concentration of magnesium ions. Crystal of BGT-UDP-Mg2+ at 20mM diffracts at 2.5Å resolution while crystal of BGT-UDP-Mg2+ at 40mM diffracts at 2Å resolution. Both crystals belong to P212121 space group but cell parameters are different. Both structures contain one magnesium ion in the UDPG binding site. The presence of a second Mg2+ ion far from the active site in the structure with 40mM Mg2+ could explain the difference of crystal packing between these two structures. Here, we present the metal site of BGT and from these two models, we propose a role of Glu163 in the catalytic mechanism of BGT.

1. Moréra, S., Imberty, A., Aschke-Sonnenborn, U., Rüger, W. and Freemont, P. (1999). "T4 phage beta-glucosyltransferase: substrate binding and proposed catalytic mechanism." J. Mol. Biol. 292, 717-730


Prediction of Functional Sites in Proteins

Patrick Aloy §‡, Enrique Querol§, F. Xavier Avilés§ & Michael J.E. Sternberg

§ Institut de Biologia Fonamental. Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain.

‡ Biomolecular Modelling Laboratory. Imperial Cancer Research Fund, Lincoln´s Inn Fields, London WC2A 3PX, UK.

E-mail: patrick@luz.uab.es

An ultimate goal of genome analysis is to determine the biological function of all the gene products in a genome. Typically, prediction of function is based on sequence similarity with proteins of known function, but even in a simple organism with a small genome, like Mycoplasma Genitalium, more than 40% of the ORFs have no homologous sequences in the databases. Moreover, the new initiatives in Structural Genomics will lead to the determination of the three-dimensional structure of proteins prior to knowledge of their function. All this has created the need of new methodologies for the analysis and prediction of protein function from its sequence and structure analysis. Here, we present a method to predict the location of functionally important sites in proteins.

A non-redundant database (<25% homology) of 107 protein chains was build using all the entries with detailed information of the functional residues in the Brookhaven protein databank ( ~ 1800). An exhaustive analysis of residue type, secondary structure and solvent accessibility preferences of the active site residue was performed. The next step was to develop a fully automated method following the ideas given by the Evolutionary Trace method (Lichtarge et al., 1996). The standard database search method WU-BLAST (Altschul et al., 1990) was used to pick up all the proteins that matched our probe sequence and a multiple alignment was then carried out with the program CLUSTALW (Thompson et al., 1994). A hierarchical clustering algorithm was implemented to build up a phylogenetic tree and the consensus sequence was extracted for each branch (subfamily). If our probe protein, or at least one of the homologous, has known structure, the consensus sequence was mapped onto the protein structure and techniques of double spatial clustering were applied in order to identify those residues that are not only conserved but close in the space as well. A sphere containing all the residues that have clustered together was then build and defined as Functional Site.

The results show that it is possible to predict Functional Sites automatically in a given protein, with 60% accuracy and 99 % significance, when we can find enough homologous sequences in databases and, at least, one of them has known structure. The method also gives us a clue to identify cases of divergent evolution with still high homology levels. The information obtained from the functional sites was used to filter filter putative protein-protein and protein-DNA docking complexes generated by FTDock (Gabb et al., 1997) giving similar results than when experimental information from mutagenesis experiments is used.


Crystallographic Studies of a Flavoprotein from E coli

A.L.Lovering

Protein Stucture Function Group, School of Biosciences, University of Birmingham

A flavoprotein from E coli, with potential use for gene-directed enzyme prodrug therapy (GDEPT) has been purified and crystallized, with the eventual aim of protein engineering. X-ray diffraction data for three different crystal forms were collected and the structures solved by MAD, SAD and MR to a maximum resolution of 1.6 Angstroms. The overall fold of the protein and environment of the FMN cofactor with substrate analogue are presented.


Homology Model of Glutamate Receptor

Indira H Shrivastava and Mark SP Sansom

Laboratory of Molecular Biophysics, Rex Richards Building

University of Oxford, South parks Road, Oxford OX1 3QU, UK

Glutamate receptors which are permeable to Na+, K+ and Ca2+ and potassium channels, selectively permeable to K+ are two important families of ion channels. The crystal structure of KcsA, a potassium channel, revealed the structure of the protein in the transmembrane region (Doyle et al, 1999). The glutamate receptor, Glur0, from Synechocystis PCC 6803 binds glutamate and forms potassium-selective channel (Chen et al, 1999). The transmembrane region of Glur0 was found to have a high sequence similarity to the KcsA protein from Streptomyces lividans, particularly in the selectivity filter region (TVGYG) and in the putative gating region. Hence, the KcsA structure was used as a template to develop a homology model for Glur0. The alignment was done such that the overlap is maximal between the selectivity filter region and the pore-lining residues of KcsA and Glur0. This model was then inserted in a fully solvated palmitoyl oleyl phosphatidylcholine (POPC) lipid bilayer. Three K+ ions were placed in the model protein, two in the selectivity filter and one in the cavity and a molecular dynamics simulation trajectory was generated for 1ns. The model is seen to be stable over the period of 1ns, in a membrane environment with no obvious collapse in any region. The ions were seen to enter into the intracellular side from the channel pore. The interaction of the ions in the selectivity filter and the cavity with the protein and water molecules, are compared to those of similar interactions in KcsA (Shrivastava and Sansom, 2000).

References

Doyle et al., (1998), Science 280:69-77

Chen et al. (1999) Nature, 402:817-821

Shrivastava & Sansom (2000), Biophys. J. 78:557-570


Calcium binding by proteins

Koen Bossers

Centre for Molecular an Biomolecular Informatics

University of Nijmegen, The Netherlands

Many proteins have calcium ions as part of their structure. The number of oxygen atoms involved in binding this ion (coordination number) can vary. The most frequently found coordination numbers are 6 and 7. With different coordination numbers, the calcium binding site adapts different geometrical conformations. In case of a 7-coordination site, the geometry is a pentagonal bipyramid, while a 6-coordination site adapts a octahedral geometry. The bidentate ligand (capable of providing two oxygen atoms for calcium binding, only glutamic acid an aspartic acid are capable of doing this) is essential for the pentagonal bipyramidal structure. Detailed statistical analysis and superposition of sites that adapt the same geometry provide valuable information for homology modelling and structure validation.


Experimental confirmation of the growth hormone/proteinase function as discovered by a threading approach of a novel gene

Cristina Mitsumori*, Henrik T. Yudate, Keiichi Nagai, Yasuhiko Masuho, Hisashi Koga

Helix Research Institute, Yana 1532-3, Kisarazu, Chiba 292-0812, Japan

Tel +81-438-52-3951 Fax +81-438-52-3952

* Correspondence: cris@hri.co.jp

Protein structure determination has revealed an unexpected conservation and divergence of function both within and between families [Thornton et al, 1999], but shows the feasibility of function prediction of novel genes based on the tertiary structure of the coded proteins. We used THREADER [Jones et al, 1992] to analyze novel genes that had no sequence similarity to proteins with known functions. We focused on cytokine/growth factors because clones screened from a full-length cDNA library possessing signal-peptides were available and studies of cytokines have revealed they can be grouped into different structural families despite lack of sequence similarity [Rozwarski et al, 1994 ].

The THREADER program lists many relevant Z scores, but in the final selection we used the score of the weighted sum of pairwise and solvation energies from the structure search (Z-13), and the score for shuffling of the sequence on the candidate using pairwise and solvation energies (Z-7).

Among the full-length cDNAs analyzed by THREADER the human lung type-I cell membrane-associated protein hT1a-2 (160 aa) [Ma et al, 1998] had a very significant Z-7 score to the human growth hormone (PDB-ID: 1huw, 191 aa) and very significant Z-13 and Z-7 scores for proteinase A (PDB-ID: 2sga, 181 aa).

The first step of the experimental strategy consisted in sub cloning of the hT1a-2 cDNA gene in a pCDNA3.1(-) Myc/His expression vector. The 30 kDa hT1a-2 protein expressed in mammalian cells was subsequently purified using a Ni column. Bioassay of the hT1a-2 activity as a growth hormone was tested using Nb2 rat lymphoma cell line. Nb2 can not grow in FBS-free medium, but exogenous application of hT1a-2 (700 ng/ml) induced a 60% increase in the growth rate, in comparison to FBS-treated Nb2 cell. Proteinase activity was tested using 5 peptide-4 methylcoumarin amide (MCA) substrates and release of 7-amino-MCA was determined fluorometrically. The purified hT1a-2 was observed to cleave the factor Xa- and trypsin-specific substrate Boc-Ile-Glu-Gly-Arg-MCA.

The above results show that hT1a-2 possesses activity both as a growth hormone and as a proteinase, as has also been described for the growth factor from Spirometra mansonoides [Phares and Kubik, 1996]. In order to determine its main function in vivo, we used the phage display method and isolated a single phage expressing a 24 residue oligopeptide that specifically binds to the hT1a-2 protein. This oligopeptide exibited a 43% sequence identity to the signal peptide cleavage region of the human insulin-like growth factor precursor, indicating that the basic function of hT1a-2 in vivo is probably as a proteinase that cleaves specific sequences.

We have used a bioinformatics approach to make a priority of which genes to analyze in the laboratory, and this is the first work confirming the function of hT1a-2 as estimated from annotation to the structures obtained from the threading approach.


A structural genomics pre-pilot project: study of 20 yeast ORFs.

Sophie Quevillon-Cheruel (1), Sylvie Auxillien (1), Michel Desmadril (1), Philippe Minard (1), Joël Janin (2)

1. Laboratoire de Modélisation et d’Ingénierie des Protéines – Orsay, FranceBernard LABEDAN - Robert AUFRERE - Gilles HENCKESInstitut de Génétique et Microbiologie – Orsay, France

2. Laboratoire d’Enzymologie et Biochimie Structurales - CNRS - Gif-sur-Yvette, France

 

Among the about 6200 ORFs of the yeast genome, we have selected 282 unique proteins and 202 families of paralog proteins with size ranging from 100 to 500 residues. These sequences were also selected in such way that they correspond to cytoplasmic proteins. In this first list of proteins, we have chosen 20 ORFs to develop an efficient and cheap method for their cloning, overexpression in E. coli, purification and crystallization. This method will be further applied to the other selected ORFs.

The 20 ORFs were amplified by PCR using S. cerevisiae genome as template. Various constructions have been tested, by adding a 6His Tag either in C-terminus or N-terminus of the protein. The PCR products were then cloned into either pET9 or pET29 vector.

The proteins were overexpressed in B834(DE3)pLysS methionine auxotrophe strain, after induction by IPTG and using methionine. The optimized expression conditions and the solubility of the ORFs were tested. Under the selected conditions of expression induction, no significant difference was observed for the two vectors used: 1/3 of the ORFs are well expressed and soluble, 1/3 are expressed but not soluble and 1/3 are not expressed. The localization of the His-Tag has effect, neither on the level of expression, nor on the solubility of tested proteins.

The soluble proteins were purified by affinity chromatography (NiNTA) and gel filtration. The purity and integrity of the proteins were tested by SDS-PAGE and mass spectrometry. The conditions of crystallization for these proteins are in the course of development.

This abstract will be presented as a talk.

3D structure of mammalian thioredoxin reductase

Comparison of TRR with other nucleotide-disulfide oxidoreductases.

Tatyana Sandalova

Medical Biochemistryu & Biophysics, Karolinska Institute, 17177 Stockholm Sweden

Mammalian thioredoxin reductase (mTRR) is a member of pyridine nucleotide -disulfide oxidoreductase (PNDO) family which contains glutathione reductase (GR), lipoamide dehydrogenase (LAD), trypanothione reductase, mercuric ion reductase and some other enzymes. All of them are flavoproteins and contain active disulfide which is reduced by NAD(P)H and then transfers the reducing equivalents to a substrate.

Thioredoxin reductase catalyses the reduction of the thioredoxin, a widely expressed 12-kDa protein that participates in many processes in the cells. Reduced thioredoxin is a hydrogen donor for ribonucleotide reductase and some other enzymes, it also controls the thiol-disulfide redox balance as well as involves in the regulation of various transcription factors. It was shown that TRR is overexpressed in certain tumor cells and down-regulated in apoptotic cells. In addition, TRR participates in ascorbate recycling; it is inhibited by auranofin – the therapeutics of rheumatoid arthritis; and it is involved in the protection of the skin from UV radiation.

Surprisingly, mTRR is more similar to human GR than bacterial or plants TRR if size, sequence and position of active cysteines are compared. TRR has broad substrate specificity, in addition to the reduction of thioredoxin it catalizes the reduction of lipoic acid, protein disulfide isomerase, and many other substrates but not a glutathione. Unlike all PNDO, mammalian TRR contains an essential selenocysteine residue. SeCys is penultimate residue of a 16 amino acid long extension of mTRR which is absent in GR or LAD. Mutation of SeCys498à Cys greatly decreases the activity of TRR, however, it allows to overexpress the protein. The recombinant SeCys498à Cys mutant of rat TRR was crystallised as a complex with NADP+. 3D structure of mTRR was solved and refined to 3Å (R/Rfree= 23.0/29.5%). It is the first 3D structure of mTRR. Here, the comparison of 3D structure of mTRR with other members of nucleotide-disulfide oxidoreductase family is presented.

The 3D structure of human GR is the most similar to that of TRR: the alignment of 3D structure shows that these two proteins have 153 identical residues at the corresponding positions, rmsd is 1.4Å for 407 residues of one subunit, or 1.7Å for 810 Ca atoms of the dimer. TOP server found other similar proteins, all of them belong to PNDO family: trypanothione reductase (1aog.pdb) with rmsd 1.4Å for 410 residues (145 of them are identical), and dihydrolipoamide dehydrogenase (1ebd.pdb) with rmsd 1.5Å for 387 residues (108 residues are identical). The main difference is 16-residues extension at C-terminus with essential Cys497-SeCys498 pair. The extension is located at the place, occupied by GSSG in GR; the distance between essential residue His472 and CysB498 is about 6.5A. The detailed comparison of the active site structure of TRR with all other members of NDO-family is presented.


One-step derivatisation in proteome analysis

Francesco Brancia(2), Simon J.Hubbard (1), Simon J. Gaskell (3) and

Stephen G. Oliver (1)

1. University of Manchester, 2.205 Stopford Building, Oxford Road, M13 9XX

2. Michael Barbert Centre for Mass Spectrometry, Department of Chemistry, UMIST, Sackvill Street, Manchester

3.Department of Biomolecular Sciences, UMIST

The identification of individual protein species within Saccharomyces cervisiae proteome has ben optimised by increasing the information produced form mass spectra analysis through the chemical dervitisation of tryptic peptides. Matrix assisted laser desoprtion ionisation time-of-flight mass spectrometry (MALDI-TOF-MS) is commonly employed to analysis samples obtained from trypcti digestion; such protelytic enzyme allows formation of digest fragments, well-suited to be protonated in the mass spectrometer. Recent studies have shown the strong dominance of signals belonging to arginine-containing peptides in the peptide mass fingerprinting spectrum. This behaviour is readily explicable considering the different proton affinities of the two C-terminal residues. Conversion of lysine residues into homoarginine containing peptides increases the number of peptides in the spectra, rendering more detectable the lysine terminal peptides; such an improvement of signal response provides additional data for searching the database. Novel bioinformatic tools have been developed so as to exploit the further information obtained with guanidination in conjunction with other chemical dervatisation.


A Bayesian network model for protein fold and remote homologue recognition.

D.L. Wild (1) , A. Raval (1) and Z. Ghahramani (2)

(1) Keck Graduate Institute of Applied Life Sciences, 535 Watson Drive, Claremont, CA 91711, USA.

david_wild@kgi.edu, alpan_raval@kgi.edu

(2) Gatsby Computational Neuroscience Unit, University College London , Queen's Square, London, UK

zoubin@gatsby.ucl.ac.uk

We describe Bayesian network models for protein folds and superfamilies which incorporate both primary sequence and structural information, with applications in the identification of remote homologues during the selection of potential targets for structure determination and in the classification of newly determined structures from structural genomics projects.

The Bayesian network approach is a framework which combines graphical representation and probability theory, which includes, as a special case, hidden Markov models (HMMs). Hidden Markov models trained on amino acid sequence or secondary structure data alone have been shown to have potential for addressing the problem of protein fold and superfamily classification. This poster describes a novel implementation of a Bayesian network which simultaneously learns amino acid sequence, secondary structure and residue accessibility for proteins of known three-dimensional structure. An awareness of the errors inherent in predicted secondary structure may be incorporated into the model by means of a confusion matrix. Training and validation data have been derived for a number of protein superfamilies from the Structural Classification of Proteins (SCOP) database. Results using posterior probability classification indicate that the Bayesian network performs better in classifying proteins of known structural superfamily than a hidden Markov model trained on amino acid sequences alone. These results will be compared to classifications obtained using predicted secondary structure and residue accessibility information, and to a Fisher kernel (Support Vector Machine) method of scoring.


Crystal Structure of Tetradecameric Mycobacterium tuberculosis Chaperonin-10

Michael M. Robertsa, Alun R. Cokerb, Anthony R.M. Coatesa and Steve P. Woodb

aDepartment of Medical Microbiology, St. George’s Hospital Medical School, Cranmer Terrace, London SW17 0RE, UK; bDivision of Biochemistry & Molecular Biology, School of Biological Sciences, University of Southampton, Bassett Crescent East, Southampton SO16 7PX, UK

Heptameric chaperonin 10 (cpn10) and tetradecameric chaperonin 60 (cpn60) interact to catalyse intracellular protein folding1. The crystal structure of Mycobacterium tuberculosis chaperonin 10 (Mtcpn10) has been solved to 2.8 Å resolution. The heptameric Mtcpn10 substructure is similar to the cpn10 structures of E. coli (GroES)2 and Mycobacterium leprae3. Each Mtcpn10 subunit has a wedge-shaped ß-barrel structure with a mobile loop from residues 17-35. A smaller loop from residues 51-56 at the other end of each subunit forms an acidic cluster of sidechains defining an 8 Å hole at the roof of the dome-shaped heptamer. The mobile loops extend from the base of the heptamer like a jellyfish. Two Mtcpn10 heptamers complex through these mobile loops to form a tetradecamer with 722 symmetry and a spherical cage-like structure. The hollow interior enclosed by the tetradecamer is lined with hydrophilic residues and is 30 Å perpendicular to and 60 Å along the seven-fold axis and could therefore encapsulate a small folded protein. Within this chamber, difference maps show electron density that matches to mobile loop peptides of Mtcpn10 enclosed under the dome of each heptamer. This is confirmed by mass spectrometry, which reveals the peptides to be cleaved from Mtcpn10 on prolonged incubation in the crystallisation buffer. Furthermore, as determined by the enzyme active site searching programme TESS4, the Glu52 and Asp53 sidechains at the roof of the dome match the stereochemistry of active site sidechains in N-acetylglucosaminidases which can cleave bacterial cell wall peptidoglycan. This implies a mechanism for Mtcpn10 secretion and could explain the significance of the Mtcpn10 mobile loop in bone resorption through the stimulation of osteoclast proliferation5 and the stimulation of the T-cell response6, since the mobile loop peptides would be transported by Mtcpn10 outside the cell for presentation to other cell receptors. The existence of the tetradecamer has been confirmed in solution for both GroES and Mtcpn10 by dynamic light scattering. Therefore, other tetradecameric cpn10 structures may be biologically significant in vivo as a mechanism for transporting a folded protein out of the cpn60 cavity for association with another folded protein. For example, two Mtcpn10 heptamers encapsulating two folded subunits would complex to form a dimer from those subunits. The crystallisation conditions, data collection and molecular replacement solution with GroES have been described7. The Mtcpn10 model was refined with NCS restraints on the 14 subunits to an R-factor of 21.3% (Rfree = 25.3%) by simulated annealing, torsion angle and positional refinement in X-PLOR8 and CNS9 in between rounds of model-building with QUANTA97 (MSI) and SwissPdbViewer10. PROCHECK11 shows 91% of residues in the allowed regions and an overall G-factor of 0.1.

1. Horwich, A.L., Weber-Ban, E.U. & Finley, D. (1999) Proc. Natl. Acad. Sci. USA 96, 11033-11040.

2. Hunt, J.F., Weaver, A.J., Landry, S.J., Gierasch, L. & Deisenhofer, J. (1996) Nature 379, 37-45.

3. Mande, S.C., Mehra, V., Bloom, B.R. & Hol, W.G. (1996) Science 271, 203-207.

4. Wallace, A.C., Borkatoti, N. & Thorton J.M. (1997) Protein Science 6, 2308-2323.

5. Meghji, S. et al. (1997) J. Exp. Med. 186, 1241-1246.

6. Rosenkrands, I. et al. (1999) Infect. Immun. 67, 5552-5558.

7. Roberts, M.M., Coker, A.R., Fossati, G., Mascagni, P., Coates, A.R.M. & Wood, S.P. (1999) Acta Crystallogr D Biol. Crystallogr. 55, 910-914.

8. Brunger, A.T. (1998) J. Mol. Biol. 203, 803-816.

9. Brunger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N.S. et al. (1998) Acta Crystallogr. D Biol. Crystallogr. 54, 905-921.

10. Guex, N. & Peitsch, M.C. (1997) Electrophoresis 18, 2714-2723.

11. Laskowski, R.A., McArthur, M.W., Moss, D.S. & Thornton, J. (1993) J. Appl. Crystallogr. 26, 283-291.


Exploiting protein metal interactions to develop NMR approaches to

structural genomics

Joanne C Ladds, Lesley K Machlachlan, Julia A Hubbard

Computational and Structural Sciences

SmithKline Beecham Pharmaceuticals R&D

New Frontiers Science Park (North)

3rd Avenue, Harlow, Essex, CM19 5AW

The goal of structural genomics is to understand the structure and function of proteins on a genomic scale. It is clear that many, and possibly even the majority of proteins exist or are active in the cell in multi-protein complexes. Thus it is vital that approaches are developed to understand protein protein interactions on a structural level in order to fully understand protein function.

Two major approaches to structure determination are NMR and X-ray crystallography. NMR has developed rapidly over the last few years and is no longer an approach that is limited to proteins of 20 KDa. The use of isotopic labelling, improved instrumentation and novel pulse sequences initially increased this limit to approximately 30KDa. Around this size structure determination dependent on large amounts of nOe data runs into difficulty. Protein - protein interactions are frequently also difficult to study due to the dearth of nOes in the interaction sites. Now a range of distance and orientation dependent NMR parameters promise to extend both the size and resolution of structure that can be determined and decrease dramatically the time that initial folds for proteins can be produced. Also these NMR approaches will increase the ability of NMR to understanding protein recognition at a molecular level.

One important class of these new approaches uses NMR parameters that become available when a paramagnetic ion (either metal or nitroxide spin label) is attached to a protein. In cases where a native metal binding site is present metal ions with appropriate properties may be substituted for the native metal. So far this has been limited to either Fe (often via heme) or Ca binding sites. In theory it could also be possible to produce generic metal-binding sites using recombinant techniques. It may also be possible that these approaches are not limited to the very strong metal-binding sites present in current published studies.

In this work we present data to discuss the use of paramagnetic lanthanides in a range of proteins with different affinities for metal ions to obtain distance and orientation data and probe how this can be extended to understand protein/ protein interactions by selection of the appropriate metal ion.


Assigning Sequences to Pfam Domains by Comparision of Medlars Documents.

Benjamin J. Stapley & Michael J. E. Sternberg.

Biomolecular Modelling Lab, Imperial Cancer Research Fund. 44 Lincoln's Inn Field, London WC2A 3PX, U.K.

Functional annotation of proteins is an ever more pressing requirement for successful exploitation of genomic information. Here, we use textual information from Medlars to aid in the assignment of sequences to pfam alignments. From a pfam seed alignment, we extract Medlars documents that are cited in the SwissProt entries of sequences in the alignment. 'Log Entropy' term weighting is applied and Pfam documents are generated by concatenation of the relevant Medlars documents. We then attempt to assign remote homologues - not included in the orginal alignments - to their respective pfam alignments but measuring cosine similarities of their cited Medlars documents to the Pfam documents. Successful assignment to one of a subset of 100 Pfam alignments is achieved with up to 40% accuracy (25% recall). In addition to aiding in the correct functional assignment of sequences, generated Pfam documents allow textual information retrieval of pfam domains with much higher recall. We have also applied the method to annotating Pfam alignments of indeterminant function.


 

Modelling of the structure and S1 specificity pocket of a potato leaf roll virus protease

Tomasz Cierpicki1, Jolanta Grembecka2, Filip Jeleń1, Marek Juszczuk3 and Jacek Otlewski1

1 Institute of Biochemistry and Molecular Biology, University of Wrocław

2 Institute of Organic Chemistry, Biochemistry and Biotechnology, Wrocław University of

Technology

3 Department of Biochemistry and Molecular Biology, Institute of Biochemistry and

Biophysics, PAS, Warsaw

The amino acid sequence of 27 kDa domain of potato leaf roll virus protease (PLRVP) does not exhibit any detectable homology to known proteins deposited in Protein Data Bank (PDB). BLAST search within non redundant protein data base allowed us to find a similarity to few proteins described as serine proteases. Further analysis by use of the fold recognition server 3D-PSSM showed the highest similarity of PLRVP to chymotrypsin-like serine proteases. Sequence comparison of PLRVP to serine proteases revealed that their most similar regions lie close to the catalytic triad residues.

The modelling of PLRVP scaffold was attempted based on the presence of structural similarity of chymotrypsin-like serine proteases, exhibiting inherent little primary structure similarity. Because they show very low sequence similarity to PLRVP, the conventional homology methods were usefulness. Therefore, we used the simulated annealing calculations based on the structural restraints derived from five selected proteases of similar fold (ETA, neuropsin, 2A, SVCP and NS3). A set of the restraints, generated for structurally conserved core residues, included upper and lower distance ranges between a-carbons, backbone dihedral angles and conserved hydrogen bonds.

The simulated annealing calculations were started from random conformations. Finally, 10 out of 30 structures with the lowest energies were selected. The calculated model included b -strands close to the catalytic His and Asp residues (strands 2, 3 and 6) and second b -barrel (strands 1’, 2’, 3’, 4’, 5’ and 6’) involving catalytic Ser residue. The modelling of the S1 specificity pocket was based on additional restraints for the loop connecting strands 3’ and 4’ extracted from ETA, ETB and Glu-SGP structures. The interactions of PLRVP with peptide ligands were modelled based on Glu-SGP-inhibitor and SGPB-OMTKY3 complexes.

Preliminary kinetic studies, using Suc-Ala-Ala-Pro-Xaa-pNA showed low proteolytic activity with some specificity for P1 Leu. Our modelling studies indicate that the S1 specificity pocket of PLRVP is primarily built of hydrophobic residues: Phe, Leu and Thr. The hydrophobic S1 pocket prefer nonpolar residues (Leu). These studies are in agreement with kinetic studies.


Crystal Structure of a Bacteriophage T7 Endonuclease I:

A Holliday Junction Resolving Enzyme

J.M. Hadden, M.A. Convery A. Declais*, D.M.J. Lilley*. and S.E.V. Phillips.

Astbury Centre for Structural Molecular Biology, School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK.

*CRC Nucleic Acid Structure Group, Department of Biochemistry, University of Dundee, DD1 4HN, UK.

Genetic recombination is a fundamental process in the evolution of all living organisms. This process results in the exchange of sequences between DNA segments and plays a fundamental role in the production of new genetic variants. The four-way DNA (Holliday) junction is an important intermediate in the recombination process.

Bacteriophage T7 encodes a 149 amino acid residue protein, endonuclease I (endo I), which has been shown to bind and cleave four-way DNA junctions in vitro. A number of mutants of endo I have been isolated that bind, but do not cleave DNA junctions and these are particularly useful for studying the binding process.

We have solved the crystal structure of one such mutant (residues 12-149) to 2.1 Å resolution using selenomethionine substituted protein and the MAD technique. Unfortunately, the form of endo I used to grow crystals does not contain any methionine residues. For this reason we introduced a single methionine residue into the protein by site-directed mutagenesis (I92M, 1 methionine per 138 residues) and following substitution we were easily able to solve the protein structure.

The structure of the isolated protein shows endo I is an unusual homodimer arranged in two domains. Each domain is composed of approximately 1/6 of the residues from one monomer and approx. 5/6 of the residues from the other monomer. The domains are connected by a small an inter-domain bridge.

Details concerning the techniques used to solve the structure of the protein together with a full description of the protein topology will be presented.


Unconventional Crystallisation Techniques that have Producd

High Quality Protein Crystals.

J.M. Hadden and S.E.V. Phillips.

Astbury Centre for Structural Molecular Biology, School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK.

Unconventional crystallisation techniques have been used successfully to grow crystals of two proteins. It has not previously been possible to produce crystals of these proteins suitable for X-ray diffraction studies.

The first technique highlights how a slow drop in temperature has been used to induce nucleation in a microbatch experiment. Once the correct level of nucleation had been achieved further nucleation was prevented, and a suitable crystal growth rate was achieved, by a small elevation in temperature. Crystals that diffracted to beyond 1.8 Å have been produced using this technique. The structure of the protein has now been solved.

The second technique outlines the effect of varying the composition of grease/oil used to seal a vapour diffusion experiment. The effect of drop surface area to volume ratio was also investigated. By choosing the correct combination of crystallisation well sealing material and drop surface area to volume ratio, large single crystals of protein were produced. These crystals diffracted X-rays to 2.1 Å and the structure of the protein has now been successfully solved.


The structure of the transmembrane segment of Vpu from HIV-1: modelling and simulations studies.

Fischer W. B., Cordes F., Sansom M. S. P.

Laboratory of Molecular Biophysics, Oxford University, South Parks Road,

Oxford OX1 3QU, UK, e-mail: wolfgang@bioch.ox.ac.uk

The genome of the enveloped virus HIV-1 encodes a small auxiliary phospho-protein with a length of 81 amino acids. It is composed of a N-terminal hydrophobic transmembrane (TM) domain (amino acids 1 - 27) and a hydrophilic cytoplasmic domain of 54 amino acids. Vpu is not found in the envelope of the virus particle but is expressed in the membranes of sub cellular compartments of the infected cell. Vpu has two major roles in the life cycle of the virus: (i) it controls the release/secretion of virus particles from the cell surface, and (ii) mediates the degradation of the CD4 protein in the ER.

There is reasonable evidence that Vpu can form ion channels. Studies on Vpu expressed in Xenopus oocytes using whole cell voltage clamp technique revealed cation selective conductance. A synthetic peptide corresponding to the putative TM segment of Vpu also showed channel activity. NMR- and FTIR-spectroscopy show that the TM segment reconstituted in a lipid bilayer is predominantly a -helical. Also X-Ray reflectivity data on Vpu containing mono layers indicate a helical structure.

Self-assembly is a characteristic feature of Vpu in vivo as well as in vitro. Until now the exact number of the homo-oligomers is not known. We have generated 5 bundles each consisting of 5 TM segments of Vpu (AIV A10 LVVAIIIAI V20 VWSIVIIE). Simulations for 2 ns were run for bundles obtained from a global molecular dynamics search protocol with restrains to experimental values . In one of the bundles all tryptophans were pointing into the pore in the other model they were pointing outwards. In comparison bundles based on the same criteria for tryptophan orientation were created by using a simulated annealing protocol combined with a short molecular dynamics simulation (SA/MD) . In addition a structure was generated driven by the idea that hydrophilic residues are facing the pore. This last model preserves its bundle-like structure throughout the simulation and seems to be the model of choice for the proposal of the bundle structure.


Structure−based design of new strong inhibitors of leucine aminopeptidase

J.Grembeckaa, W.A.Sokalskib, P.Kafarskia

a Institute of Organic Chemistry, Biochemistry and Biotechnology

b Molecular Modelling Laboratory, Institute of Physical and Theoretical Chemistry

Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 WROCŁAW, POLAND

The Ligand Design (LUDI/Insight_97.0) program was applied in order to design leucine aminopeptidase inhibitors, predict their activity and analyze the interactions with the enzyme. The investigation was based on the crystal structure of bovine lens leucine aminopeptidase (LAP, 1lcp) complexed with its inhibitor – the phosphonic acid analogue of leucine (LeuP), a lead compound in our studies. The LUDI Link mode was used to obtain new inhibitors of the enzyme, which were designed by the modification of LeuP structure. More than 50 potential leucine aminopeptidase inhibitors were obtained, including the most potent aminophosphonic LAP inhibitors with experimentally known activity . Several of new designed inhibitors were synthesized and their activity towards the enzyme was measured. All of the tested compounds appeared to be strong LAP inhibitors. Most of them are significantly more active than already known inhibitors of the enzyme, containing phosphorus atom in the structure. The most active among the tested amino acid analogues is the phosphonic analogue of homophenylalanine (Ki=0.14 m M for the DL mixture), while the phosphinic analogue of leucylleucine (Ki=0.11 m M for the mixture of 4 diastereomers) is the most active among the peptide analogues. A reasonable agreement between theoretical and experimental activities has been observed for most of the studied inhibitors. Our results confirm that LUDI is a powerful tool for the design of enzyme inhibitors as well as in the prediction their activity.

In addition, for inhibitor-active site interactions dominated by the electrostatic effects it is possible to improve binding energy estimates using more accurate description of inhibitor charge distribution . For this purpose we applied the another method, developed in our laboratory, which is based on an ab initio calculations of the interaction energy in ligand–receptor system . This permitted us to obtain more precise inhibitory activity estimates than using LUDI scoring function for several known LAP inhibitors differing with the electronic structure of functional groups .


Gearing Individual Optimisation Methods Towards High Throughput

Naomi E. Chayen & Emmanuel Saridakis

Biological Structure and Function Section, Division of Biomedical Sciences, Imperial College School of Medicine, London SW7 2AZ, UK

High-throughput screening crystallisation trials are already under way in several laboratories world-wide, but optimisation, which is the more difficult part, has yet to be adapted to cope with the volume of experiments required by the Genome Projects.

The first multiple experiments for both screening and optimisation were done as microbatch trials under oil[1]. This procedure lends itself for adaptation to high-throughput crystallisation.

The utilisation of oil has established a unique way of producing crystals, making the experiments more efficient and saving time and materials. Oil affects the accuracy, cleanliness and reproducibility of crystallisation experiments as well as providing a reliable environment for controlling nucleation and growth [2].

We have designed the following optimisation methods, which have resulted in production of better-ordered crystals compared to those grown by conventional methods:

This poster will present examples of successful crystallisation of several proteins as well as ways to automate and adapt all these methods to high-throughput applications.

[1]Chayen, N.E. et al.(1990) J. Appl. Cryst. 23, 297-302.
[2]Chayen, N.E. (1998) Acta Cryst. D54, 8-15.
[3]Chayen, N.E. (1997) Structure 5, 1269-1274.
[4]Saridakis, E.E.G. et al.(1994) Acta Cryst. D50, 293-297.
[5]Saridakis, E. and Chayen, N.E. (2000) Protein Science 9, 755-757.


Practical limits of function prediction

Damien Devos & Alfonzo Valencia

Protein Design Group, CNB-CSIC

Cantoblanco, Madrid E-28049, Spain

Tel:+34 91 585 48 39 Fax:+34 91 585 45 06

Web: http://montblanc.cnb.uam.es/

The widening gap between sequences and functions has lead to the practice of assigning a potential function to an uncharacterised protein based on sequence similarity with other proteins of experimentaly investigated function. Even if the reliability of those homology based functional assignments is not well characterized, it represent common practises in whole genomes functional assignments. We propose here a systematic approach to the study of the margins of error in homology based functional prediction by analysing the conservation of the functional annotations in a large set of structural alignments. In particular, we analyze five aspects of protein function, commonly used in genome annotation, namely: i) PDB header line, ii) Enzymatic function classification: DE code, the standard definition of the chemical nature of the enzymatic function; iii) Functional annotations in the form of keywords, describing the biochemical function such as the interactions with compounds, cofactors, substrates, regulators and other cellular components; iv) Classes of cellular function, capturing the main types of cellular activities in which proteins participate, e.g. "carbon compound metabolism" or "DNA biosynthesis"; and v) Conservation of the type of amino acid in the binding site, related with the binding activity of the protein, and in many cases, the specificity of binding different substrates and cofactors. The screening of the full range of sequence functional similarities allows us to present an initial picture of the relation between sequence and functional similarity, and in particular, to derive a theoretical error rate for homology-based functionnal assignments (1). With those data, we estimate the theoretical error rates of predicted functions in different genomes. Indeed, it is particularly interesting to think in the consequences of this study for whole genome annotations carried out by automatic systems (2) and to compare the expected level of error with the different values published by different groups of expert annotators (3, 4, 5).

1.Devos and Valencia. PROTEINS in the press

2.Andrade et al.. Bioinformatics 1999; 15:391-412

3.Brenner. Trends Genet. 1999; 15: 132-133

4.Galperin and Koonin. In Silico Biol. 1998; 1:0007

5.Ouzounis et al.. Mol. Microbiol. 1996; 20: 985-900

This abstract will be presented as a talk.

Inferring protein quaternary structure from X-ray crystallographic data

Hannes Ponstingl(1), Kim Henrick(1) and Janet M. Thornton(1,2)

(1) EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

(2) Biomolecular Structure and Modelling Unit, Biochemistry and Molecular Biology Department, University College London and Crystallography Department, Birkbeck College, London, United Kingdom.

The physiologically relevant macromolecular assembly of multimeric proteins can often not be derived without ambiguity from crystallographic studies of protein structure. An automatic tool is being developed to differentiate physiologically relevant intermolecular contacts from contacts that are artifacts of the crystalline state. We compare the performance of simple structural features in identifying the macromolecular assembly. The comparison is based on a non-redundant set of protein structures with known multimeric states prevalent in solution. Non-parametric statistical methods are used for error assessment.


Protein-protein interaction networks

Pazos F., Blaschke C., Oliveros J.C., and Valencia A.

Protein Design Group. CNB-CSIC. Cantoblanco, Madrid 28049. Spain

Tfn. +34-1-585 45 70, Fax. +34-1-585 45 06, email: valencia@cnb.uam.es

The increasing knowledge about individual protein components (genome sequences, structural genomic initiatives, high throughput functional genomics) is making clear the need of integrating information in superior orders of complexity. Protein-protein interaction is the obvious next step in this direction. We present here three complementary computational efforts for the study of protein-protein interactions.

The first approach is based on the study of the patterns of variation in multiple sequence alignments. The rational behind these approaches is that proteins that have evolved to form specific molecular complexes would have accumulated during evolution compensatory substitutions that can be detect in current protein families. We have previously demonstrated that the analysis of the patterns of variation is sufficient to single out the right inter-domain docking solution amongst many wrong alternatives in two-domain proteins (1) and tested the predictions of interacting regions in different experimental systems (2-3). The extension of this method to the detection of interacting partners in large collections of multiple-sequence alignments shows quite promising results in terms of coverage, related with the number of interactions predicted in complete genomes, and accuracy, defined as the quality of the predicted interactions when compared with known molecular complexes (4).

The second approach is based on the application of text retrieval techniques (5-6) to the extraction of information about protein interactions directly from the scientific literature (Medline abstracts). Our current system is able to detect automatically networks of functional interactions, by identifying protein names and the actions liking them (7). We will discuss the results of the application of the system to different complex biological systems.

Finally, the application of clustering techniques (8) and text retrieval methods (9) to the available expression array results leads to a new avenue for the discovery of relations between genes, that can be considered as complementary information to the predicted and detected protein interactions, and represent promising new technologies to be combined with other experimental approaches like yeast two hybrid systems.

1.- Pazos et al (1997) J. Mol. Biol. 272: 1-13

2.- Gdssler et al (1998) Proc. Natl. Acad. Sci. USA. 95: 15229-15234.

3.- Azuma et al (1999) J. Mol. Biol. 289:1119-1130.

4.- Pazos et al (2000) submitted

5.- Andrade Valencia (1997) ISMB 5: 25-32.

6.- Andrade Valencia (1998) Bioinformatics 14: 600-607.

7.- Blaschke et al (1999) ISMB 7: 60-67.

8.- Herreros et al., (2000) submitted

9.- Blaschke et al., (2000) submitted


Drug Discovery: From Genes to Leads

Stanley R. Krystek & Jonathan S. Mason

Bristol-Myers Squibb Pharmaceutical Research Insititute

Princeton, NJ 08543, USA

The integration of genomics information with drug discovery is expected to identify, in the next few years, thousands of novel protein targets. This presentation will describe how combining structural genomics methodologies and structure-based drug design can be used to prioritize drug discovery projects. The application of the following methods to disease targets allows for the rapid generation of potential lead compounds.


Molecular Basis of the Specificity requirements of Arginase and Agmatinase, Two Enzymes with a Common Evolutionary Origin: Homology Modelling and Site Directed Mutagenesis of Escherichia coli Agmatinase.

Mónica Salas*, Rolando Rodríguez# , Elena Uribe*, P.Herrera, Vasthi López* , and Nelson Carvajal*

*Departamento de Biología Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile:

# CIGB. La Habana, Cuban and EMBL-Heidelberg, Germany

Arginase (EC 3.5.3.1) and agmatinase (EC 3.5.3.11) catalyses the production of urea from closely related substrates. In fact, agmatine results from decarboxylation of arginine by arginine decarboxylase. On the other hand, several highly conserved residues are detected in the amino acid sequences of these enzymes. For these reasons, they are considered as members of the arginase family of proteins. The idea is that they diverged from a common evolutionary origin, to reach their particular substrate specifities. The crystal structures of rat liver and Bacillus caldovelox arginases are available, and specific roles has been asigned to several active site residues, including His101, His126, His141 and Asp128 (according to their positions in the rat liver sequence). These roles has been also validated by chemical modification and site-directed mutagenesis of the rat liver and human liver arginases. A critical role for His163 (His141 for rat liver arginase) has been also deduced from chemical modification and site-directed mutagenesis of Escherichia coli agmatinase.

At present, a crystal structure for agmatinase is not available. We have, therefore, used molecular modelling, by analogy with B. caldovelox arginase as a reference, to obtain a model for the structure of E. coli agmatinase. The model thus obtained gives an accurate description of the interaction of agmatinase with Mn2+ and the existence of a binuclear metal center in fully activated enzyme. It also suggest a significant role for a loop, that include C159, Y155 and F161, in agmatine binding to agmatinase. Since this loop differs from that for arginase, which is bigger, a knowledge of these regions would explain the diffences in specificity between arginase and agmatinase. To test the validity of these conclusions, site directed mutagenesis was used to introduce changes in the loop for agmatinase. Replacing these residues by the corresponding residues in the sequence of arginase, the single-mutants C159S, Y155N and F161N, the double-mutants C159S/Y155N and Y155N/F161N and the triple-mutant Y155N/C159S/F161N were constructed. Interestingly, alteration in the entire conformation of this region, produced in the triple-mutant, was required for total loss of agmatinase activity. The single-mutant C159S and the double-mutant C159S/Y155N were even more active than wild.type agmatinase (~ 2-4 fold) and the other species were almost equally active than wild.type enzyme. In conclusion, our results emphasizes the importance of one specific loop region in substrate recognition by agmatinase. For a better understanding of the significance of these loops regions for the specificity requirements of arginase and agmatinase, insertions are now being introduced in the agmatinase sequence.


In Silico Structural Analysis of Bacterial Virulence Factors

Kelly Paine & Darren Flower

Bioinformatics Group, Edward Jenner Institute for Vaccine Research,

Compton, Newbury, Berks, RG20 7NN, UK,

Tel – 01635 577954 Email – kelly.paine@jenner.ac.uk

Pathogens can be distinguished from their avirulent counterparts by the presence of specific gene clusters or "pathogenicity islands" that convey the virulence necessary for infection [1]. These can be acquired through lateral transfer between distinctly related species in evolution, and are dubbed "virulence factors". Exotoxins secreted by such pathogens are an excellent example; any function inferred from the tertiary protein structure can be used in the development of new drugs.

For example, the recently published structure of an invasive Streptococcus pyogenes SpeB cysteine protease [2] revealed a hitherto unknown homology to the papain protease family. This gave a new insight into the mechanisms of virulence carried by the protease. An important human integrin-binding motif was also discovered, hinting that the exotoxin may have multiple functions. Most importantly, an invariant finger loop at residues 19-42 on the mature protease was identified as a potential therapeutic antibody-binding site.

Our focus is on virulence factors as potential candidate vaccines. The structure determination of prototypic virulence factors will facilitate the prediction of their potential antibody binding sites and the delineation of escape mutations, as well as allowing the design of new antibiotics and antimicrobial drugs. We have developed new approaches to the problem of selecting sequences for structural analysis and are currently applying them to our database of virulence factors. Dissimilarity searching algorithms, coupled to in-depth analyses of protein families using the PRINTS methodology [3], have been used to select candidates for structural analysis.

  1. Mecsas, J.J. and Strauss, E.J. Emerg Infect Dis 2(4):270-88, 1996. "Molecular mechanisms of bacterial virulence: type III secretion and pathogenicity islands."
  2. Kagawa, T.F, Cooney, J.C., Baker, H.M., McSweeney, S., Liu, M., Gubba, S., Musser, J.M and Baker, E.N. Proc Nat Acad Sci USA 97:2235-2240, 2000. "Crystal structure of the zymogen form of the group A Streptococcus virulence factor SpeB: An integrin-binding cysteine protease."
  3. Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P.,

Selley, J. and Wright, W. Nuc Acids Res 28(1), 225-227, 2000. "PRINTS-S: the database formerly known as PRINTS."


The identification of novel, putative metallo-lactamase-like metal binding domains & folds

Brian P. Clarke and Mike G. Tennant

SmithKline Beecham Pharmaceuticals, Computational & Structural Sciences,

New Frontiers Science Park (North), Third Avenue, Harlow, Essex, CM19 5AW, UK.

Using state-of-the-art sequence searching programs, we have identified a broad class of proteins which mediate a zinc-dependent hydrolytic reaction. Included in this superfamily are bacterial metallo--lactamase (MBL), glyoxylase-II, arylsulphatase, sdsA, CPSF, phnp and cpdP proteins. By examining the sequences and the known structures of MBL proteins in this family, we conclude that these proteins have evolved through divergent evolution and maintain hydrolytic function and that an archetypal protein fold exists for this family.


Designing sequence profiles from an all-atom force field

Alfonso Jaramillo, Stephany Hery, Lorenz Wernisch and Shoshana J. Wodak

Service de Conformation de Macromolecules Biologiques et de Bioinformatique, Universite Libre de Bruxelles, av F.D. Roosevelt 50 - CP 160/16, B-1050 Bruxelles, Belgium

Phone: 32-2-6505200, 6502013, FAX: 32-2-6488954

Email: alfonso@ucmb.ulb.ac.be

Understanding the mechanism of protein folding and the factors that govern the stability of the protein native state remains a major goal in molecular biology. "Which sequence are compatible with a given fold?" is another formulation of the same problem, also termed the inverse folding problem, which may have useful practical application in de novo protein design. With the aim of answering this question, we implemented DESIGNER, a versatile procedure for selecting sequences that are compatible with a given backbone structure. The sequence selection is done by computing the folding free energy difference, between the corresponding models for the folded and unfolded states. We use our interface to the CHARMM program and the force field comprises all the classical non bonded energy terms of CHARMM, and a implicit solvation free energy term. We illustrate the application of DESIGNER to the design of core and surface residues in three proteins (the SH3 domain, protein G and Ubiquitin) and full designs of SH3 domain related proteins. In the core and surfaces designs, DESIGNER is shown to select sequences that are much more similar to the native sequence, than any other available method. The full designs are used to evaluate the influence of backbone flexibility in protein design. It is also shown to generate native-like sequence profiles, which offer the opportunity of investigating many interesting questions, including how the structure constraints the amino acid sequence.


Analysis of multiple gene expression responses to salt stress in the Halophyte Mesembryanthemum crystallinum using microarray technology

João P. Maroco *(1), Christine B. Michalowski (2), M. A. Cushman (1), Hans J. Bohnert (2), David Galbraith(3) and John C. Cushman(1)

1. Dept. of Biochemistry and Molecular Biology. Oklahoma State University. Stillwater, OK. USA
2. Dept. of Biochemistry. The University of Arizona, Tucson, AZ, USA
3. Dep. of Plant Sciences. The University of Arizona, Tucson, AZ, USA
* Present address: Lab. Ecofisiologia Molecular. IBET-ITQB. Av. Republica. EAN. 2784-505 Oeiras. Portugal

Plant responses to environmental stresses are mediated through the coordinate action of many hundreds of genes each with distinct expression profiles. Thus far, gene expression studies have been limited to one or a few genes at a time due to methodological limitations. Recently, - microarray technology has made it possible to study the expression profiles of thousands of genes simultaneously. In this communication we report a microarray- based evaluation of multiple gene induction by salt stress using more than 1000 expressed sequence tags (ESTs) from the halophytic plant, Mesembryanthemum crystallinum. We found that approximately 20% of genes associated with CAM metabolism and osmotic stress resistance exhibit induced (>two-fold) expression, whereas a somewhat lower number of genes associated with CO2 fixation showed down-regulated patterns of expression. In addition, 63 new genes with no previously described function are up- or down-regulated by more than 2 fold during salt stress. Determination of these expression patterns represents the first functional information about this set of anonymous ESTs. This analysis of expression profiles of known and unknown genes provides the first integrated assessment of coordinated gene expression patterns in a higher plant undergoing salt stress.


The uses of gels for protein crystallisation

J. Lopez-Jaramillo, J.M. Garcia-Ruiz, M.A. Hernandez-Hernandez, J.A. Gavira, Gonzalez-Ramirez, F. Otalora

Laboratorio de Estudios cristalograficos (CSIC-UGRA). Facultad de Ciencias, Campus Fuentenueva, Granada (SPAIN)

It is well known that removing convection from the crystallisation reactor yields crystals of higher quality. One way to achieve it and assure a mass transport scenario governed by diffusion is the use of gels or high viscosity non-newtonian fluids. We present here a new crystallisation technique, termed Gel Acupuncture MEthod (GAME), which is based on the counter diffusion of protein and precipitating agent solutions, and exploits the properties of gels.

The counter-diffusion arrangement allows to screen a continuous range of crystallisation conditions in one single experiment consuming as few as 2 microliters of gelled protein solution. To fully exploit the advantage of counter-diffuion, it is mandatory to use a long protein chamber. Then, it is possible to obtain a sequence of precipitation pattern starting from amorphous precipitation and finishing with faceted large crystals of the highest quality. Thus, our technique automatically finds the best crystallisation conditions and yields isolated crystals inmobilised by the gel matrix.

We demonstrate that the experiments can be performed inside the same X-ray capillaries that will be used later for data collection at both room temperature and cryo without any post-crystallisation manipulation (1). In addition to the improvement in crystal quality, this method has among others the folloing practical advantages:

1. Minimises the volume of protein solution (less than a drop volume)
2. No need of crystal mounting (i.e. no damage of crystals)
3. Easy transport to synchrotron facilties (crystals are inside the capillary and inmobilised by the gel) 4. Crystals can be tested in the home x-ray source, and those of interest can be diffracted at synchrotron in cryo with no post-crystallization manipulation

We will also present another application of gels to protein crystallography: the direct use of electrophoretic gels for screening crystallisation conditions (2). Crystals grown from gels after native electrophoresis and isoelectric focusing will be presented.

References
1. F.J. Lopez-Jaramillo, J.M. Garcia-Ruiz, J.A. Gavira, F. Otalora. Submitted to J. Appl. Cryst.
2. J.M. Garcia-Ruiz, M.A. Hernandez-Hernandez, F.J. Lopez-Jaramillo and B.Thomas. J. Crystal Growth, In press.


Monte Carlo envelopes for removing the uncertainty in the evolutionary trace method

Mark K Dean, Richard E Smith, Graham J G Uptona, Paul D Scottb and Christopher A Reynolds

Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ. aDepartment of Mathematics, bDepartment of Computer Sciences.

The evolutionary trace method1 is potentially a very powerful method for studying protein-protein interactions. It involves determining the conserved and conserved in class residues in a multiple sequence alignment for a protein family with a common fold. The conserved in class residues are conserved within all the subgroups, defined by a dendritic tree, for a given partition identity cutoff (PIC) and residue position. The ET method involves plotting these ET residues onto a space-filling structure for increasing PIC values as long as the ET resides cluster. The arbitrary step in the process involves deciding when the ET residues cease to cluster, but rather become distributed randomly over the surface of the protein. To remove this arbitrary step a cluster score is calculated for the ET distribution at each PIC value, which is compared to the cluster score for 99 equivalent random distributions. The ET analysis is therefore continued until the cluster score for the ET distribution is comparable to that for the random distributions. The performance of this new method is assessed through applications on a number of systems including G-protein coupled receptor dimers2, heterotrimeric G-proteins (RGS4 - AlF4- - activated Gi1 complex), the Cylin A - CDK2 complex and the Beta-trypsin - pancreatic trypsin inhibitor complex.

1. Lichtarge, O.; Bourne, H. R.; Cohen, F. E. Proc.Natl.Acad.Sci.U.S.A (1996) 93, 7507-7511.
2. Gouldson, P. R.; Higgs, C.; Smith, R. E.; Dean, M. K.; Gkoutos, G.; Reynolds, C. A. Neuropsychopharmacology (2000) 24, Oct. issue, ~1 Sept.


Mapping disease-causing SNPs to Protein Structure

Carles Ferrer-Costa, Modesto Orozco, Xavier de la Cruz

Unitat de Modelatge Molecular i Bioinformatica, Departament de Bioquimica i Biologia Molecular, Facultat de Quimica. Universitat de Barcelona., c/ Marti i Franques, 1, 08028 Barcelona, Cataluynia, Spain

Unitat de Modelatge Molecular i Bioinformatica, Departament de Bioquimica i It is well known that variations in the consensus sequence of a protein can cause dramatic alterations in its function, leading to disease. Our work is focused in describing, in structural terms, those single nucleotide polymorphisms (SNPs) that cause pathological effects in humans. To this end we analysed a set of human proteins for which disease-associated SNPs are known. Every pathological variant was described in terms of secondary structure, solvent accessibility and situation in surface cavities. This was done mainly at two levels of structure, tertiary monomeric structure based on the PDB monomer coordinates and quaternary oligomeric structure, using the PQS structure prediction from PQS database. In addition, we analyse changes in physicochemical properties due to mutation according to their location in structure. We studied free energy variations derived from the partition coefficients of the amino acids, and variations in secondary structure propensities. In this poster we show the results of the previous analysis and suggest some general characteristics of pathological mutations.


Sequence and structural analysis of the human MHC class III region

Ranjeeva D. Ranasinghe (1), Geoff J. Barton(2), Alan J. Bleasby(1), Jon C. Ison(1), John B. C. Findlay(3), Begoña Aguado(1) R. D. Campbell(1)

(1)MRC Human Genome Mapping Project Resource Centre, Hinxton, Cambridge, CB10 1SB, UK
(2)European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
(3) School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK

The complete sequence of the Major Histocompatibility Complex (MHC) in human is known. The MHC is divided into three regions, class I, II, and III. We focus on the characterization of fold by sequence analysis and homology modelling of the predicted proteins encoded in the genes located in the class III region. The class III region spans approximately 1.1Mb on human chromosome 6p21.3 and has been predicted to contain 59 genes. Three sequence searching methods were applied to MHC III gene products: Standard sequence search using BLAST against a non-redundant sequence database that contained sequences of known 3D structures (NRL3D), searching a domain profile databases (Pfam, and SMART) and a fold recognition method (GenTHREADER). The conservation of predicted secondary structure to that of known structural families was also assessed.
A key requirement for current homology modeling techniques is that the protein being modeled must share high sequence similarity to a protein of known structure. One of the findings from this study is that MHC class III gene products show low (20%) sequence similarity to annotated sequences of known structure. Furthermore the fold recognition analysis showed that the number of proteins whose fold can be correctly identified was low (36%). Moreover, in many cases (13%) where the regions were shown to be similar, the similarity only spanned a short region of the query protein, from which no prediction could be made of its potential 3D fold or function. These results have highlighted the need for the development of new software tools for the alignment and detection of fold and function of remote protein homologues.
We will extend the method of protein signatures, which has been applied to characterize several families. We will apply our methods to the Ig superfamily and its various subgroups, as these are particularly important for the MHC class III proteins. We will construct a library of sparse signatures for each type of known Ig domain. The signatures will be derived from key residue positions taken from the literature, alignment and analysis of correlated mutations. The existing algorithm will be adapted for correlated mutational data and optimal parameters for handling of gaps and residue variability will be established. The usefulness of the library for detecting known immunoglobulin domains and for detecting new family members will be established. Further, we will test whether it is possible to generate signatures that are characteristic of distinct functional and molecular interaction properties. The end result will be a library of signatures that are diagnostic of structural and functional properties of the immunoglobulin domains.


BASIC: Bilaterally Amplified Sequence Information Comparison

Leszek Rychlewski, Janusz M Bujnicki

Bioinformatics Laboratory
International Institute of Molecular and Cell Biology
Warsaw, Poland
http://bioinfo.pl

Several features of a protein can be inferred based on sequence similarity or assumed homology with other proteins. These include 3D structure or general functional description. Sequence alignment is the most common approach used for the assertion of homology. The predictive power and utility of homology based prediction methods increases with the continuously growing database of proteins with annotated structure or functions. Additional increase in predictive power can be attributed to the improving accuracy and sensitivity of sequence comparison methods. Consideration of sequence information deduced from the family of proteins closely related to the query protein, as performed by PSI-Blast, enabled a dramatic boost in the predictive power of sequence comparison methods. The approach of amplification of sequence information by incorporation of evolutionary related sequence is being pursued further in a bilateral fashion. The BASIC program represents a prediction method utilizing the evolutionary information on both sides of the comparison: the query and the template. The current descendants of this approach, FFAS and ORFeus, are presented in this work.