AI React 2020 Abstracts & Speaker Bios : AI 4 Scientific Discovery

Plenary Speakers

AI and Chemistry – Professor Pierre Baldi (University of California, Irvine)
This presentation will provide an overview of the unique challenges and opportunities that chemistry poses for artificial intelligence and machine learning, and the arsenal of deep learning methods available today to predict the properties of small molecules and the outcome of chemical reactions, among other problems. Two reaction outcome prediction systems, ReactionExplorer and Reaction Predictor, will be reviewed. Issues of availability of the data, and interpretability and limitation of the methods, will also be discussed.

Bio: Pierre Baldi earned MS degrees in Mathematics and Psychology from the University of Paris, and a PhD in Mathematics from the California Institute of Technology. He is currently a Distinguished Professor in the Department of Computer Science, Director of the Institute for Genomics and Bioinformatics, and Associate Director of the Center for Machine Learning and Intelligent Systems at the University of California Irvine. The long term focus of his research is on understanding intelligence in brains and machines. He has made several contributions to the foundations of AI and Deep Learning and, with his group, to their applications in the Natural Sciences. In terms of chemical reaction prediction, his group has developed both a rule-based system (ReactionExplorer) and a machine-learning-based system (ReactionPredictor).

A Non-Deterministic Chemputer for Running Chemical Programs – Professor Lee Cronin (University of Glasgow)
How can new routes to target molecules be designed, discovered and the validated? How can we use unexpected discoveries to help plan reactions in real time? In this lecture I will explain how the use of an automated platform, with feedback control, can allow the dynamic planning and execution of the synthesis process using a Chemputer. Because the chemical-based ontology connects the abstraction of chemical synthesis with its real time execution, we can explore deterministically defined process combinations in a non-deterministic manner. This means we can discover new approaches to target known molecules using closed loop systems that explore reactivity within the constraints given by accessing a given target molecule.

References
[1] J. Granda, L. Donina, V. Dragone, D. –L. Long, L. Cronin ‘Controlling an organic synthesis robot with machine learning to search for new reactivity’, Nature, 2018, 559, 377-381.
[2] P. Kitson, G. Marie, J. –P. Francoia, S. Zalesskiy, R. Sigerson, J. S. Mathieson, L. Cronin ‘Digitization of multistep organic synthesis in reactionware for on-demand pharmaceuticals’, Science, 2018, 359, 314-319.
[3] S. Steiner, J. Wolf, S. Glatzel, A. Andreou, J. Granda, G. Keenan, T. Hinkley, G. Aragon-Camarasa, P. J. Kitson, D. Angelone, L. Cronin ‘Organic synthesis in a modular robotic system driven by a chemical programming language’, Science, 2019, 363, 144-152
[4] P. S. Gromski, A. Henson, J. Granda, L. Cronin ‘How to explore chemical space using algorithms and automation’, Nat Rev Chem., 2019, 3, 119-128.

Bio: Leroy (Lee) Cronin FRSE was born in the UK in 1973 was appointed to be Regius Professor of Chemistry in Glasgow in 2013 after being a professor (2009 & 2006) and reader in Glasgow since 2002. Between 2000-2002 he was a lecturer at the University of Birmingham. Alexander von Humboldt research fellow (Uni. of Bielefeld); 1997-1999: Research fellow (Uni. of Edinburgh); 1997: Ph.D. Bio-Inorganic Chemistry, Uni. of York; 1994 BSc. Chemistry, First Class, Uni. of York. Prizes include 2019 Japan Society of Coordination Chemistry International Prize, 2018 ACS Inorganic Lectureship, 2018 RSC Interdisciplinary Prize, 2015 RSC Tilden Prize, 2013 BP/RSE Hutton Prize, 2012 RSC Corday Morgan, 2011, Election to the Royal Society of Edinburgh in 2009. His research has four main aims 1) the construction of an artificial life form / work out how inorganic chemistry transitioned to biology / searching for new life forms; 2) the digitization of chemistry; and 3) the use of artificial intelligence in chemistry including the construction of ‘wet’ chemical computers; 4) The exploration of complexity and information in chemistry. He runs a team of around 60 people funded by grants from the UK EPSRC, US DARPA, Templeton, Google, BAe, JM and is developing both ‘open-source’ as well as commercial chemputers. See www.chemify.org

ASKCOS: data-driven chemical synthesis – Dr Connor Coley (MIT)
Laboratory automation can decrease the manual effort of synthesis, but determining how to synthesize a compound continues to require time and effort investment from expert chemists. To achieve fully autonomous chemical synthesis, one must have robust synthesis planning software that can propose fully-specified synthetic routes to target molecules. In this talk, I will provide an overview of our recent efforts to develop algorithms that can leverage historical reaction data to inform decision-making in small molecule pathway design. This includes algorithms for retrosynthesis, recommending reaction conditions, and predicting the products of chemical reactions. This is part of a larger effort to use artificial intelligence techniques to develop a quantitative understanding of chemical reactivity for the accelerated discovery of functional small molecules.

Bio: I’m generally interested in questions of how data and automation can be used to streamline discovery in the chemical sciences. I’ve recently been focused on identifying questions in chemistry and cheminformatics that are answerable by the data we have access to; another equally important problem, however, is how to efficiently conduct experiments to gather the information we would need to answer currently-unanswerable questions. My future research will work toward a new paradigm of computational assistance for molecular discovery through an interdisciplinary approach combining chemistry, chemical engineering, and computer science in close partnership with experts in application domains such as chemical biology and human health. My PhD work was comprised of two parallel tracks: experimental and data-driven/computational. The former includes the actual construction, control, and use of automated microfluidic reactor platforms for screening and optimizing chemical reactions; the latter includes the development of a data-driven synthesis planning program and in silico strategies for predicting the outcomes of organic reactions. For this work, I was named a DARPA Riser, one of C&EN’s Talented 12, and one of Forbes 30 under 30: Healthcare. One of the last products of my PhD was releasing the open source version of ASKCOS–a computer-aided synthesis planning program–which features heavily in our demonstration of AI-assisted robotic execution of flow reactions, written up in Science.

Invited Keynote Speakers

Computer-Assisted Design of Complex Organic Syntheses – 50 years on & – Professor Peter Johnson (University of Leeds) [This talk is sponsored by CAS SciFinder]
In the 50 years since the publication of E J Corey’s seminal paper on this topic, a number of alternative approaches have appeared, but up to now the tools generated have not seen widespread acceptance by practitioners in synthetic organic chemistry. However, this may soon change since the past decade has seen an upsurge in alternative approaches to the problem, many of which use information derived from large reaction databases to generate retrosynthetic pathways backed up by literature precedent. Some of the specific challenges faced in this endeavour will be outlined together with some of the historical and current approaches to their solution.

Bio: After undergraduate and doctoral education at the University of Manchester, Peter Johnson was a NATO postdoctoral fellow in the synthetic chemistry group of Professor A Eschenmoser at the ETH in Zurich. Further postdoctoral studies at Nottingham University were followed by a faculty position at the University of North London. During this time, a period of study leave at Harvard University, working with E J Corey on the LHASA project, served as a wonderful introduction to chemoinformatics, and this has been a major component of his research ever since In 1980 Peter Johnson joined the University of Leeds, where he is now Emeritus Professor of Chemistry. Chemoinformatics research in Leeds led to the development of a number of chemistry-based software applications: SPROUT for de-novo ligand design, CAESA for assessment of synthetic feasibility, CLiDE for extraction of chemical information from images, and most recently Chem21 ELN. In addition to his research in chemoinformatics, he also led a synthesis group which accomplished the total synthesis of a number of natural products including the sesquiterpene b-Vetivone and the first synthesis of the alkaloid Gelsemine. Peter Johnson has also founded a number of software companies, some of which were spinouts of the University research, including LHASA Ltd (not for profit company which now employs 150 people), Orac, Synopsys, Simbiosys and Keymodule.

Gathering molecules: representations and machine learning with minimal data – Professor Jonathan Goodman (University of Cambridge)
We have information about only a tiny proportion of molecules of interest. Is it reasonable to extrapolate the information we have to help us understand the vast uncharted regions of molecular space? For example, how sure can we be that our structural diagrams relate to the structures of actual molecules? All analytical data has layers of interpretation. Do our models of chemical reactivity adequately describe the reactions that we have done and are planning? Are we ignoring key physical properties that control or change the chemical reactivity that we can calculate for abstractions of the real systems? Successes in spectral interpretation and reaction analysis demonstrate that confidence is not always misplaced.

Bio: Jonathan is a Professor of Chemistry at the University of Cambridge, and Director of Studies of Chemistry at Clare College, where he also serves as the Academic Dean. His research focuses on experimental and computational chemistry, analysing organic reaction mechanisms, interpreting analytical data and investigating computational chemical toxicology. He is also secretary of the Subcommittee on the IUPAC International Chemical Identifier, and has developed the Reaction-InChI (RInChI): an InChI-based identifier for chemical reactions.

Introduction to ML and Structured matrix methods for learning outliers – Professor Mahesan Niranjan (University of Southampton)
Much of the subject of Machine Learning, which has caught the imagination of a wide range of researchers in various topics, is about supervised and unsupervised learning problems. However, in several problems of practical interest, in scientific discovery problems in particular, interesting information to be extracted lies in outliers. Detecting these in a systematic manner, coming under the topic of novelty/ surprise/ outlier/ changepoint detection often has to do with the need to circumvent the so-called curse of dimensionality. This talk is partly a tutorial introduction to recent developments in machine learning, leading to two methods using regression and structured matrix approximation to detect outliers by avoiding the pitfall of modelling probability densities directly.

Bio: Mahesan Niranjan is a Professor of Electronics and Computer Science at Southampton. Prior to this, he worked at the University of Cambridge as a Lecturer in Information Engineering. And the University of Sheffield as a Professor of Computer Science, where he also served as Head of Computer Science and Dean of Engineering. Mahesan works in the area of machine learning, and his research interests are in the algorithmic and applied aspects of the subject. He has worked on a range of applications of machine learning and neural networks including speech and language processing, computer vision and computational finance. Currently, the major focus of his research is in computational biology. Alongside his duties at Southampton University, he also often travels to other international universities to present his research and teach intense short courses in Machine Learning.

Applying AI to retrosynthesis in the wilderness – Mikołaj Sacha (Molecule One / NYU)
Recent developments in artificial inteligence have led to a revolution in the technology industry. At the same time, even superhuman AI systems exhibit a very shallow understanding of the domains they are applied to. Should this concern us when applying such systems to automatic retrosynthesis? In this talk, we will try to convince you that despite somewhat oversimplified understanding of chemistry, AI has the potential to revolutionize this field. We will describe our one year journey that culminates in launching a product that successfully uses AI and in particular deep learning to deliver value to our customers.

Bio: Mikołaj Sacha works at Molecule.one, where he is a core member of the team developing machine learning methods for synthesis planning/reaction prediction. He graduated from Jagiellonian University with a M.Sc. in Computer Science (specialty: deep learning). Afterwards he gained practical ML/NLP experience while working for Polish and Swedish companies. His scientific interests include methods of optimization, meta-learning and generative models. He is also passionate about applying his skills to solving impactful problems in drug discovery.

Accurate excited states calculations on near term quantum computers – Jules Tilly (Rakho)
Calculating excited states is a non-trivial task with current computational chemistry methods. It finds applications in many areas, including reaction dynamics. Even though TDDFT and other methods are common practice, it still requires a lot of expertise and proper benchmarking in order to obtain useful results. Quantum computers are expected to help overcome challenges for such calculations. Here we present a novel, and near-term implementable method for calculating excited states on a near-term quantum computer. We then discuss further extensions and how machine learning and quantum computing could be used for excited states, and prediction of reaction dynamics.

Bio: Jules specialises in developing NISQ quantum machine learning algorithms with a focus on Quantum Chemistry. He is a Quantum Research Scientist at Rahko and currently completing his PhD at UCL under the supervision of Pr. J. Tennyson. Prior to this, Jules worked for 6+ years in financial services acting as regulatory and strategic advisor for several global investment banks. He holds degrees in Mathematics, Quantum Physics, Law, Economics, Finance and Public Policy.

Making sense of predicted routes: the use of data as evidence for predictions in SciFinder – Paul Peters (CAS)
As machine-learning approaches advance in synthesis planning and organic chemistry at large, the ability of chemists to rationalize predictions and to garner knowledge from the results is essential for the viability of the solutions as research tools. The content used for training models and algorithms is often the most effective means to provide the necessary insights. The linkage between predictions and the underlying data has been a guiding principle in the design of the retrosynthesis feature in SciFinder. Empirical data is used by the engine along with the predictive capabilities during the construction of synthetic routes, and it is further used to provide evidence for predicted steps. In this presentation we will demonstrate how access to the evidence and to analysis tools help chemists make informed decisions when taking new ideas to the lab.

Bio: Paul Peters has worked at CAS for 25 years supporting customers in Europe. He has a BSc degree in chemical engineering from the Amsterdam University of Applied Sciences. In his current role as the Director of Customer Success Specialists he leads a global team of specialists who provide onsite or remote training and support to our customers or prospects. The training will be highly relevant to the customer situation, market segment and business needs of the users. His team includes direct reports in EMEA and the US and he coordinates the support given in Asia and Latin America. He has been closely involved in the development and testing of SciFinder-n and its retrosynthesis capabilities.

Intelligence from Data: Towards Prediction in Organometallic Catalysis – Dr Natalie Fey (University of Bristol)
Computational studies of homogeneous catalysis play an increasingly important role in furthering (and changing) our understanding of catalytic cycles and can help to guide the discovery and evaluation of new catalysts [1, 2]. While a truly “rational design” process remains out of reach, detailed mechanistic information from both experiment and computation can be combined successfully with suitable parameters characterising catalysts [3] and substrates to predict outcomes and guide screening [4]. The computational inputs to this process rely on large databases of parameters characterising ligand and complex properties in a range of different environments [5-8]. Such maps of catalyst space can be combined with experimental or calculated response data [7], as well as large-scale data analysis. Rather than pursuing a purely computational solution of in silico catalyst design and evaluation, an iterative process of mechanistic study, data analysis, prediction and experimentation can accommodate complicated mechanistic manifolds and lead to useful predictions for the discovery and design of suitable catalysts. In this presentation, I will use examples drawn from our recent work, including the early stages of our development of a reactivity database, to illustrate this approach.

Website: https://feygroupchem.wordpress.com/

References

C. L. McMullin, N. Fey, J. N. Harvey, Dalton Trans., 43 (2014), 13545-13556
N. Fey, M. Garland, J. P. Hopewell, C. L. McMullin, S. Mastroianni, A. G. Orpen, P. G. Pringle, Angew. Chem. Int. Ed., 51 (2012), 118-122.
D. J. Durand, N. Fey, Chem. Rev., 119 (2019), 6561-6594.
J. Jover, N. Fey, Chem. Asian J., 9 (2014), 1714-1723.
A. Lai, J. Clifton, P. L. Diaconescu, N. Fey, Chem. Commun., 55 (2019), 7021-7024.
O. J. S. Pickup, I. Khazal, E. J. Smith, A. C. Whitwood, J. M. Lynam, K. Bolaky, T. C. King, B. W. Rawe, N. Fey, Organometallics, 33 (2014), 1751-1791.
J. Jover, N. Fey, J. N. Harvey, G. C. Lloyd-Jones, A. G. Orpen, G. J. J. Owen-Smith, P. Murray, D. R. J. Hose, R. Osborne, M. Purdie, Organometallics, 29 (2010), 6245-6258.
J. Jover, N. Fey, J. N. Harvey, G. C. Lloyd-Jones, A. G. Orpen, G. J. J. Owen-Smith, P. Murray, D. R. J. Hose, R. Osborne, M. Purdie, Organometallics, 31 (2012), 5302-5306.

Bio: I was born in Frechen, Germany, but have lived and worked in the UK for quite a while now. I obtained my BSc in Chemistry and Economics from Keele University (UK), and stayed on to work with Jim Howell and Paul Yates towards a PhD (completed in 2001). After postdoctoral research with Rob Deeth at the University of Warwick until 2003, I worked as a postdoc on projects with Guy Orpen, Jeremy Harvey and Guy Lloyd-Jones at the University of Bristol before gaining an EPSRC Advanced Research Fellowship (October 2007). My independent research at Bristol is in computational inorganic chemistry and involves mechanistic studies of catalysis and the development of knowledge bases. I was appointed to a temporary lectureship in 2015, made permanent in 2017, and promoted to senior lecturer in 2018. I’m the programme director for Chemistry with Scientific Computing and the Deputy Director of Bristol Scientific Computing.

Chemistry ontologies and artificial intelligence – Dr Colin Batchelor (Royal Society of Chemistry)
Distributional semantics, as exemplified by embeddings such as GLoVe and FastText, and the sort of formal semantics encoded in an ontology are complementary approaches to answering the question of what a text is about. In this talk I describe how they can inform each other and also the open resources that the Royal Society of Chemistry is making available in this and related fields.

Bio: Colin Batchelor is a Senior Data Scientist at the Royal Society of Chemistry. After doctoral work in theoretical chemistry with Mark Child at Oxford he was a technical editor at the RSC before a succession of roles working on, amongst other things, natural language processing, cheminformatics, machine learning and ontologies. He is currently the chair of the InChI working group on organometallic and coordination structures.

UDM – a community-driven data format for the exchange of comprehensive reaction information – Dr Jarek Tomczak (Pistoia Alliance)
The first reaction databases were developed in the early 1980’s and electronic laboratory notebooks have been in use in chemistry for almost 20 years. However, we still do not have a well-defined way of capturing and exchanging information about chemical reactions and rely on imprecise or vendor-specific data formats. Without a common language and structure shared by all users to describe experiments or predictions, data integration is unnecessarily expensive, and a significant part of published data has not been readily available for processing or analysis. The Unified Data Model (UDM) project delivers a solution to this problem. It is a collaborative effort of vendors and life science organizations to create an open, extendable and freely available reference model and data format for the exchange of information about compound synthesis and testing. Run under the umbrella of the Pistoia Alliance, the project team has recently published a third, stable release of the UDM data format. The presentation will provide the rationale behind the UDM model, its structure, applications and relevance for data analysis.

Bio: Jarek Tomczak is the technical lead for the Unified Data Model run under the umbrella of the Pistoia Alliance. He is the founder of Informatics Unlimited, a technology company developing solutions for biomedical imaging and cheminformatics. Jarek obtained his PhD in chemistry from the University of Wroclaw, Poland followed by post-doctoral research at the Computer-Chemistry-Centre in Erlangen, Germany. Before starting his own company, Jarek worked for Aventis (now Sanofi) and Accelrys (BIOVIA).

A Structured Recipe Based Approach in Process Research and Development – Dr William Maton (Janssen, Pharmaceutical companies of Johnson and Johnson) and Dr Allyson McIntyre (AstraZeneca)
To understand the chemical space to develop a chemical process, a tremendous amount of data (including experimental procedures, in situ PAT profiles, process observations, etc.) are required. Within the process research and development area there is an ambition to generate better quality data from every experiment and enable that data to be searchable and retrievable. Having the ability to collect, search and share data will increase the understanding of the chemical space and drive the processes forward. To accelerate data driven decisions and fulfill more aggressive timelines, the scientists need the ability to capture their data in a more structured and consistent approach underpinning the scientific data rational. But historically, our scientists document their experiments however they see fit, which means that they might call things by different names, misspell words, abbreviate. Additionally, different people describe things with different levels of detail, and a standard way of working is almost impossible to implement if a less structured methodology is used for documentation. That’s why we are implementing a structured recipe-based approach as the foundation to contextualize continuous data and enable data comparison from lab to plant. This approach is based on the S88 process model, an international standard to introduce a common language for process descriptions and bringing context to the data. It separates the recipe, or how a process is described, from the equipment used to execute the process. So, it drives standardization, not just of terminology but of parameters that need to be collected to define a recipe. We have defined a standard recipe vocabulary through internal and external collaboration across functions of development, the clinical supply chain and commercial manufacturing of Janssen and AstraZeneca, supporting the different phases of development. The structured recipe-based approach will be the foundation to generate process context that can be extracted, enable recipe exchanges between the Electronic Lab Notebook and the Automated Lab Reactor, automate data visualization, analysis and reporting and supporting Tech Transfer enabling the organization to gain further understanding of their chemical processes more efficiently.

William Bio: William joined Janssen in 2017 as associate director in Small Molecule Pharmaceutical Development involved in people and project management. In his previous career at Bial – Portela located in Porto (Portugal), William worked as senior process chemist for the process research and development of intermediates and APIs at different stages of development from pre-NME to commercial. He was involved in the technology transfers to CMO’s and CRO’s. This experience provided him a complete drug development overview from discovery till product launch. Before William has worked for CRO (Aptuit srl) “the dark side of the pharma force” where he has developed new innovative synthetic routes for generic product companies. William started his career at GSK Verona as senior scientist in chemical development involved in process development from early phase till phase 2b. William holds a PhD in Organic Chemistry from the Université René Descartes, Paris 5, France where he studied in the group of Professor Yves Le Merrer.

Allyson Bio: I am passionate about maximising understanding of our chemical processes. I am excited about what can be achieved through use of technology, process understanding tools and data management.

From mechanisms to reaction selectivity – Professor Per-Ola Norrby (AstraZeneca)
Forward reaction prediction has wide applications, from general scoring of retrosynthesis pathways to selection of appropriate reaction conditions. Reaction selectivity predictions using quantum chemical methods is well established, but the accuracy/cost ratio is not favorable, especially when multiple pathways must be considered. We have explored several alternatives with improved accuracy and/or lowered cost, still within a strong mechanistic framework and using DFT as an essential component. Examples that will be highlighted include development of reaction-specific Q2MM force fields for selectivity predictions, and different ways of combining quantum chemical calculations with machine learning for regioselectivity predictions in C-H activation.

Bio: Broadly experienced organic chemist (theoretical), working largely in the interface with computational, physical, and inorganic chemistry, and life science. Three decades of academic experience, ended as professor in organic synthesis in Gothenburg. Since 2013, moved to AstraZeneca to become a principal scientist focused on all aspects of chemical reactivity, from retrosynthesis to degradation.

Automated mining of a database of 9.2M reactions from the patent literature, and its application to synthesis planning – Dr Roger Sayle (NextMove Software)
NextMove Software’s Pistachio is a reaction database containing over 9 million reaction instances, consisting of over 2.9M unique reactions. A distinguishing feature of this data set, compared to Elsevier’s Reaxys or InfoChem’s SPRESI, is that it is entirely automatically extracted from the patent literature, and involves no manual curation. The majority of reactions are extracted by text mining of the full text of United States Patent Office (USPTO) and European Patent Office (EPO) applications and grants, with the remaining 2.3M reactions extracted from the ChemDraw sketches provided by the USPTO since 2001. This talk describes the text mining and sketch processing technologies used to prepare Pistachio, with special mention of the challenges of extracting quantities and yields, interpreting inorganic reagents, representing mixtures, and capturing the operational steps and each step’s conditions/duration. Finally, the talk will conclude with some of the insights gained from analyzing and classifying the reactions in Pistachio, and how these may influence retrosynthetic analysis and machine-learning approaches to chemical design.

Bio: Roger Sayle gained his Ph.D. in Computer Science from the University of Edinburgh, Scotland. Before starting NextMove Software, Roger has worked at Glaxo-Wellcome, Metaphorics LLC and OpenEye Scientific Software.

The Semantic Laboratory – Dr Samantha Kanza (University of Southampton)
The use of semantic web technologies within the laboratory has slowly gained momentum over the last twenty years. Researchers have realised that these technologies are key to dealing with large volumes of data, and that they enable better organisation of scientific documents and practices. However, we still have a long way to go before these technologies realise their true potential, and this is as much a human endeavour as a technological one. This talk will discuss the affordances of the semantic web, demonstrating where it can be used across the entire scientific research process; but it will also note some lessons learned throughout the last twenty years, and provide some recommendations for going forward in the future.

Bio: Samantha Kanza is an Enterprise Fellow at the University of Southampton. She completed her MEng in Computer Science at the University of Southampton and then worked for BAE Systems Applied Intelligence for a year before returning to do an iPhD in Web Science (in Computer Science and Chemistry), which focused on Semantic Tagging of Scientific Documents and Electronic Lab Notebooks. She was awarded her PhD in April 2018. Samantha works in the interdisciplinary research area of applying computer science techniques to the scientific domain, specifically through the use of semantic web technologies and artificial intelligence. Her research includes looking at electronic lab notebooks and smart laboratories, to improve the digitization and knowledge management of the scientific record using semantic web technologies; and using IoT devices in the laboratory. She has also worked on a number of interdisciplinary Semantic Web projects in different domains, including agriculture, chemistry and the social sciences.

Integrating AI with Robust Automated Chemistry: AI Driven Route Design and Automated Reaction & Route Validation – Dr Mario Latendresse (SRI Biosciences)
Mario Latendresse, Peter Madrid, Markus Krummenacker, Jeremiah Malerich, Peter Karp, Nathan Collins SRI International, 333 Ravenswood Avenue, Menlo Park, California 94025, USA. The future of computer-assisted synthetic route programs will mix literature based, experimentally verified and computer-generated reactions. These different sources create networks of tens of millions of reactions, which becomes a challenge for efficient searching of synthesis routes. Recent computer-assisted synthesis programs for chemistry based on AI often use transformation rules to generate reactions. These rules are either created manually or programmatically from reaction examples extracted from databases. In the latter case, the number of rules can become large (> tens of thousands), which is an issue for their efficient application in planning novel routes. We present an approach to limit the number of rules created by recursively merging similar rules until a sufficient number of reaction examples is covered by each rule. We present a new general search algorithm, used by a synthetic route planning program SynRoute, that can efficiently find multiple optimal routes to drug-like target molecules. Moreover, SynRouteTM enables design of experiment such that routes can be rapidly and experimentally validated on a high throughput reaction screening and optimization platform called SynJetTM, which uses inkjet technology to print multicomponent reactions on the order of a reaction a second. Multistep routes can be screened and optimized in a few hours and then scaled by either batch or flow methods with full reaction condition data capture that enable more efficient reaction prediction in future SynRoute searches. Examples of routes from SynRoute that have been performed on automated chemistry platforms will be presented, demonstrating the potential for a continuously improving, data-driven synthetic planning platform.

Bio: Mario Latendresse, Ph.D., designs and implements computational tools in bioinformatics and chemoinformatics. Latendresse has created several software tools such as computing atom mappings for biochemical reactions, finding optimized metabolic routes to engineer metabolic pathways, helping develop models for organisms based on flux balance analysis, finding optimized chemical routes to synthesize molecules, and applying machine learning in chemoinformatics. Prior to joining SRI, Latendresse was a researcher for the Science and Technology Advancement Team at Fleet Numerical Meteorology and Oceanography Center and taught computer science at two Universities in Montréal. Latendresse has published in several domains, including mathematics, functional languages, bioinformatics, chemoinformatics, and computer security. He holds a doctorate in computer science from the Université de Montréal, Québec, Canada.

Retrosynthesis via Machine Learning – Dr Marwin Segler (Benevolent.ai)
The introduction of modern machine learning (ML) has triggered a paradigm shift in automated synthesis planning [1]. This has two main reasons: First, ML allows to tackle the problem of laborious hand-coding of transformations, and chemical knowledge by learning from large reaction datasets, eventually enabling self-improving systems. Second, perhaps even more importantly, ML provides a rigorous metrics framework, which allows to benchmark and thus improve retrosynthetic systems. Surprisingly, in the 60 year history of automated synthesis planning, examples of systematic evaluation beyond anecdotal evidence are rare. In our talk we will highlight recent advances in automated synthesis planning [2,4]. We will critically discuss the advantages and limitations of the currently applied models, such as manually encoded [7], neural-symbolic[1,3,5], and seq2seq-based models[6] and how to address them, and give an outlook on open challenges, such as the need for developing novel chemical reactions [8].

[1] a) Segler, M., et al., Chem. Eur. J. 2017, 23, 5966. b) Segler, M. et al., ICLR WS 2017
[2] Segler, M. et al. Nature 2018, 555, 604.
[3] Coley, C., et al. ACS central science 2017, 1237.
[4] Coley, C., et al. Science 2019, 365.6453.
[5] Dai, H., et al. NeurIPS 2019
[6] Liu, B., et al. ACS Cent. Sci. 2017, 1103.
[7] a) Corey, E. et al. Science 1969; b) S. Szymkuć et al. Angew. Chem. Int. Ed. 2016, 5904
[8] Segler, M., et al., Chem. Eur. J. 2017, 23, 6118.

Bio: Marwin Segler studied Chemistry in Muenster and Madrid. He is currently a Lead Scientist at BenevolentAI. Segler pioneered modern machine learning for automated retrosynthesis and de novo molecular design.

Evolutionary computing strategies and feedback control for directed execution and optimisation of chemical reactions – Professor Harris Makatsoris (Kings College, London)
The evolution of a chemical system towards a desired property within a fitness landscape, is a very attractive strategy for reaction optimisation experimentally. It allows the efficient exploration within complex parameter spaces that may contain multiple maxima or minima but without any detailed knowledge of the structure of the space. Unlike other approaches, it does not impose a requirement to collect information necessary to calculate gradients. Experimental design approaches that have been recently reported although have demonstrated good performance, they employ search strategies along a single steepest ascent or descent pathway with some cases requiring gradient calculation. This prevents them for discovering better designs within complex spaces as they get confined (trapped) very quickly within a particular region of the space as they only explore around a single extremum. In contrast, evolutionary approaches avoid this as they sample points from across the whole search space. Furthermore, evolutionary strategies are robust and resilient to experimental and measurement errors and can be applied in manual or fully automated experimental scenarios. However, automated techniques platforms rely on feedback mechanisms that require the integration of Process Analytical Tools (PAT) and methodologies to pre-process the data from observables before determining the next experimental design of the next iteration. This talk demonstrates the application of these techniques with the use of a fully automated flow system.

Bio: Harris is a Professor of Sustainable Manufacturing Systems at Kings College London. He is also a Director at Centillion Technology Limited, and the Principal Investigator of the Directed Assembly Network. He is on the board of two overseas companies, has 3 patents to his name and has produced over 70 research outputs to date. He is highly experienced in process engineering, manufacturing systems, and commercialising technology and has spent the last 10 years developing AI controlled flow reactor systems for functioning materials production. His research focuses on AI both from an experimental and computational perspective and he has experience in leading multidisciplinary teams in large research projects.

Computational Modelling at the Interface of Physical Organic and Supramolecular Chemistry – Professor Fernanda Duarte (University of Oxford)
Recent advances in both experimental and computational techniques pose an exciting yet challenging time for chemistry. Current computational methods enable chemists to interrogate chemical processes at the molecular level. Despite these advances, several challenges remain when exploring unusual reactivity and/or targeting novel catalysts. Among them i) the accurate description of both electronic and energetic properties, i.e. obtaining the right answers for the right reasons, ii) the efficient modelling of structurally dynamic systems, and iii) the efficient evaluation of novel catalysts. In this talk, I will discuss our ongoing efforts to build systematic protocols for predicting reactivity and catalysis in different molecular processes. Our approach focuses on a detailed understanding of the fundamental chemistry behind these reactions and the development of efficient workflows to automate further explorations. I will first present examples in the area of physical organic chemistry, where our models have allowed us to understand the stability and reactivity of strained molecules.[1] Secondly, I will discuss the effect of non-covalent interactions and flexibility on biomimetic catalysis, and our recent computational developments on the design of such systems.[2,3]

[1] A.J. Sterling, R.C. Smith, E. Anderson, F. Duarte. Straining to react: delocalization drives the stability and omniphilicity of [1.1. 1] propellane. ChemRxiv 2019. Preprint
[2] R. L. Spicer, A. Stergiou, T. A. Young, F. Duarte, M. D Symes, P. J. Lusby, Host-Guest Induced Electron Transfer Triggers Radical-Cation Catalysis. J. Am. Chem. Soc., 2020, ASAP. DOI: 10.1021/jacs.9b11273.
[3] T. A. Young, V. Martí-Centelles, J. Wang, P. J. Lusby, F. Duarte. Rationalizing the Activity of an “Artificial Diels-Alderase”: Establishing Efficient and Accurate Protocols for Calculating Supramolecular Catalysis. J. Am. Chem. Soc., 2020, 142, 3, 1300.

Bio: Fernanda obtained her PhD in Chemistry from the Pontificia Universidad Católica de Chile (PUC) in 2012. During this time, she was awarded a Fulbright Fellowship at Duke University (USA). After completing her PhD, Fernanda undertook postdoctoral work at Uppsala University (Sweden) with Prof Lynn Kamerlin. In 2015 she moved to the University of Oxford with a Royal Society Newton Fellowship, which she accepted in lieu of an offer for a Marie Curie Career Grant. In January 2017, Fernanda joined the School of Chemistry with a Chancellor’s Fellowship. In October 2018 she returned to Oxford to take up her first faculty appointment as Associate Professor of Computational Organic Chemistry. She currently directs an interdisciplinary and vibrant research team working at the interface of computational, organic and supramolecular chemistry.

Retrosynthetic Software for practicing chemists: Novel and efficient in silico pathway design validated at the bench – Dr Hugo Viana (Merck)
In a continuously evolving landscape of in silico chemical intelligence and machine learning, computer assisted synthetic planning has come to the forefront of discussion in the cheminformatics space. Herein, we describe the use of SYNTHIA™, a retrosynthetic design software (now exclusively in the Merck family of tools) in drug discovery, industrial, and academic laboratories all over the world. As a product of over 15 years of research, this unique tool is poised to not only get better with time, but also revolutionize the way chemists approach designing pathways to their complex targets. SYNTHIA™’s unique approach to building our expert database of known reactions by hand coding each transformation has allowed this tool to become a bench chemist’s ally by ‘learning’ chemistry much like a chemist would themselves, and suggesting diverse pathways towards their targets, thus generating ideas and providing cost effective routes based on each user’s unique needs. As a product of over 15 years of research, this unique tool is poised to not only get better with time, but also revolutionize the way chemists approach designing pathways to their complex targets. And after only year on the market, SYNTHIA™ is already transforming the way chemists design molecules for drug discovery and beyond.

Bio: Hugo Matos Viana is a Portuguese chemistry specialist based in the UK since 2016. As of 2017 Hugo is Merck ́s Chemistry Research Technology Specialist for UK and Ireland. Before this position, Hugo was leading the R&D team working for a Chemistry company involved in the development of new and more efficient chemical oxygen generators. Hugo holds a PhD in Organic Chemistry from the Universities of Évora (Portugal)/Max Planck Institute for Colloids and Interfaces (Germany), where he was involved in investigating the synthesis of new compounds to treat neurodegenerative disorders. He also holds a MSc in Medicinal Chemistry from Minho University (Portugal). Hugo enjoys motorcycles and has particular interest in topics like astrophysics and the universe.

Submitted Speakers

Reproducibility in Chemistry – Dr Mark Warne (DeepMatter)
Synthesis optimization and reproducibility in organic synthesis remain critical issues for both industry and academia in a chemical world where there is an unrelenting focus on productivity, coupled with the need to be safe and sustainable. Machine Learning and Artificial Intelligence are held out as a panacea, but ultimately how confident are we in the reliability of the underlying data being used with these algorithmic techniques, and are we really understanding the whole picture relying on manually reported analytical endpoints? We describe how with comprehensive data, a synthesis prediction tool can continuously improve.

Bio: Mark Warne, was appointed as Chief Executive Officer of DeepMatter Group plc on the 2 July 2018. Mark, who joined DeepMatter as a Non-Executive Director in September 2015 also served as its Executive Chairman between April 2017 and July 2018. Mark is widely recognised in the UK and International life sciences sector, having spent almost 10 years at IP Group Plc, a leading intellectual property commercialisation company, where he led the Healthcare team. He managed a portfolio of £330m of net assets in 2016/2017 and represented IP Group on the boards of both listed and private companies. In 2018, concurrent with the integration of Touchstone Innovations into IP Group, Mark became a Partner in the Life Sciences division. He joined IP Group from pre-clinical drug discovery CRO, Exelgen, where he was Managing Director. Mark spent eight years at Exelgen (formerly Tripos Discovery Research) where he also held positions in licensing and strategic affairs, project management and research. He has a PhD in Computational Chemistry, an MSc in Colloid Science and a BSc in Chemistry, all from the University of Bristol. Mark is a Chartered Chemist and member of the Royal Society of Chemistry. He serves as a non-executive director on the boards of Open Orphan plc and Ixico plc.

What is the importance of false reactions for efficient data-driven retrosynthetic analysis? – Dr Quentin Perron (IKTOS)
In order to predict retrosynthetic routes of organic compounds with a purely data-driven approach, one has to overcome the lack of “false” reactions reported in the scientific literature (i.e. publications and patents). Indeed, accessible chemical data are almost exclusively made of working reactions. Counterintuitively, not working or “false” reactions are equally important to train a purely data-driven retrosynthetic algorithm on a retrosynthetic planning task. Reaction scope, chemo- and regioselectivity as well as functional groups orthogonality or chemical hazards are part of the chemical knowledge hidden in these negative results. Hence, generating or accessing relevant “false” reactions is key to overcome this challenge. Considering a “community enriched” open access platform would pave the way for feeding the algorithms with the appropriate knowledge and would allow to circumvent this issue. The present lecture will cover all these aspects via a concrete case study on Spaya, an open access retrosynthetic tool.

Bio: Quentin Perron is a medicinal chemist by training. He holds a PhD in organometallic chemistry from the University of Geneva. During his post-doc fellowship at UCLA he worked on the total synthesis of Brasillicardin A, a complex natural molecule known for having a potent immunosuppressive activity. After working as a medicinal chemist in CNS indications at Laboratoires Servier, he switched to data science and chemoinformatics at Quinten, a company specialized in data science services. In 2016, with his business partners Yann Gaston-Mathé and Nicolas Do Huu, he co-founded Iktos, a start-up company developing AI technologies for new drug design. He is now the CSO of the company.

Combining artificial intelligence with structured high quality data in chemistry – delivering outstanding predictive chemistry applications – Dr Abhinav Kumar (Reaxys)
Reaxys and its predecessors Beilstein, Gmelin and Patent Chemistry have served the chemistry community well with scrutinized, high quality chemical information over the last 150 years. Reaxys is typically used by bench chemists whose focus is on synthetic chemistry, however, Reaxys Medicinal Chemistry, a companion of Reaxys, is also becoming an essential tool for medicinal chemists in drug discovery workflows. With advancements in machine learning technologies and improvements in computing power, new applications are envisioned, and some are already in place, which leverage the vast amount of Reaxys chemistry data readily available in a structured machine-readable format. Our two posters and the related digital demos discuss new ways on how to provide developers, data scientists and computational chemists with access to this huge knowledge base and present an application in the field of predictive retrosynthesis: Reaxys Retrosynthesis Engine (Pending AI).

Bio: I am the Head of Chemistry Solutions (Reaxys and Reaxys Medicinal Chemistry) at Elsevier. In my current role I work with researchers and pharmaceutical companies to provide innovative decision support tools for small molecule drug discovery and cheminformatics solution for better utilisation of ‘big data’. I have previously worked in Life Sciences strategy consulting at Monitor Deloitte. I have extensive experience in advising on strategy development for leading UK and global life sciences firms. My specific areas of expertise include new models for drug development, digital strategy for R&D and commercial strategy for pharmaceutical companies and contract manufacturing service providers in the life science sector. I hold a doctorate in Pharmacy from King’s College London and have been awarded with Young Investigator Award from 4 different international organizations within the respiratory medicine domain for my research including Pat Burnell Young Investigator Award.

Reaction prediction in process chemistry with hybrid mechanistic and machine learning models – Dr Kjell Jorner (AstraZeneca)
Predicting the outcome of reactions in process chemistry with mechanistic models based on quantum chemistry presents a series of advantages over purely knowledge-based approaches. These advantages include quantitative measures of reactivity, detailed understanding of the mechanism, structural factors controlling reactivity as well as the effect of catalysts or reagents. On the other hand, the applicability of current methods is limited by the sometimes poor accuracy of density functional theory and implicit solvent models as well as incomplete treatment of dynamic effects. Consequently, chemical accuracy with errors below 1 kcal/mol can only be reached in fortuitous cases. One way forward could be correcting the computed values using machine learning with experimental data. To investigate this approach, we construct reaction-class specific models where the computed activation energies are used as input for a machine-learning model together with descriptors of the reactants. We then train the model on high-quality kinetic data from the scientific literature and predict absolute activation energies. The descriptors are chosen to be information rich with respect to chemical reactivity and include properties such as local electrophilicity, local nucleophilicity and steric effects. Our proof-of-principle model of nucleophilic aromatic substitution can predict both absolute reactivity as well as chemo- and regioselectivity and the effect of solvent and catalysts. This information can be used by the synthetic chemist to make risk assessment of steps in long synthetic routes. It also provides detailed understanding of the mechanism, that can be used together with synthetic and mechanistic experiments to guide development of improved reaction conditions.

Bio: Kjell Jorner is a Postdoctoral Fellow at AstraZeneca UK, where he works on predicting the outcome of chemical reactions of pharmaceutical interest. A computational organic chemist by training, he completed his PhD in 2018 at Uppsala University, Sweden, with a thesis on the implications of aromaticity in organic photochemistry. His current research focuses on the combination traditional mechanistic modelling using fast quantum-chemical methods and machine learning for predictive methods with high accuracy.

Data-driven exploration of the catalytic reductive amination reaction – Dr Benjamin Deadman (Imperial College – ROAR)
There are recognised deficiencies in the current pool of reaction data which are hindering the progress of synthetic chemistry towards becoming a predictive science. Over a century of synthesis has yielded a vast library of reactions but there is a pressing need for more comprehensive data sets which include negative results, multiple time-point reaction data, and interoperable synthesis procedures. The Centre for Rapid Online Analysis of Reaction (ROAR) is a new facility which brings together high-throughput (HT) batch and flow reactor platforms, in-situ analytic technologies, and automation expertise to enable data-centric research in synthesis. In this talk we will present an exploration of the catalytic reductive amination reaction using the full capabilities of the ROAR facility. In our approach HT robotic batch reactor platforms are utilised to screen a range of heterogeneous catalysts for activity and selectivity in reductive amination reactions. Active catalysts discovered during the screening are subsequently transferred to parallel batch reactors and packed bed flow reactors, with automated sampling capabilities and Design of Experiments (DoE) approaches utilised to optimise reactions. HT techniques are then employed to rapidly map out the relationship between substrate scope and reaction conditions. Automated batch and flow reactors are employed throughout the process to provide a high level of consistency in synthetic manipulations, and to provide a complete and machine-readable record. The reductive amination reaction is the first system to be studied comprehensively in ROAR, but we will apply these techniques more widely in our ambition to develop quality data sets for other chemical transformations. The advanced capabilities of the ROAR facility are available, through open calls for proposals, to researchers with an interest in the data-rich exploration of chemical synthesis.

Bio: Ben is the Facility Manager of the Centre for Rapid Online Analysis of Reactions (ROAR). This new, state-of-the-art facility provides the UK Dial-A-Molecule community with the tools and protocols needed to perform data-rich experimentation in synthesis. He was previously a Research Associate (9/2015 to 9/2017) working with Dr. King Kuok (Mimi) Hii and Prof. Klaus Hellgardt on the application of electrochemically generated oxidants in organic synthesis. Benjamin received an MSc from the University of Waikato (New Zealand) before moving to the University of Cambridge (UK) as a Commonwealth Scholar in 2009. After completing his PhD under the supervision of Prof. Steven Ley in 2013 he moved to University College Cork (Ireland) to work with Prof. Anita Maguire as a Postdoctoral Research Associate of the Synthesis and Solid State Pharmaceutical Centre (SSPC).

Machine-Assisted Flow Chemistry for Organic Synthesis – Dr Christopher A Hone (Research Centre Pharmaceutical Engineering (RCPE))
In this talk, efforts made in our laboratory to use automation and computational methods for the development of flow chemistry processes will be highlighted. In particular, the coupling of a modular microreactor platform with real-time analysis by IR and NMR, and online UPLC will be discussed. This integrated platform will be demonstrated for the efficient optimisation of a multistep organometallic transformation without the need for human intervention. A further case study involving the generation of process models for an aerobic oxidation operating within a segmented flow regime will be examined. The process models generated were underpinned with a residence time distribution (RTD) study and computational fluid dynamics (CFD) simulation. The application of the models for identification of the optimal operating conditions will be considered.

Bio: Christopher studied for his Master’s degree in Chemistry at the University of Southampton. As part of his degree, Chris worked within the robotic synthesis team within discovery chemistry at Syngenta. Subsequently, he moved to the Institute of Process Research and Development (iPRD) at the University of Leeds for his doctoral studies, working under the supervision of Dr Richard Bourne, Prof Frans Muller and Prof Steve Marsden. His thesis focused on the development of kinetic models through the utilization of continuous flow reactors. On completion of his PhD, Chris moved to Graz, Austria, for a postdoctoral position in collaboration with AstraZeneca within the research group of Prof C. Oliver Kappe. He is now a Principal Scientist, working under the scientific leadership of Prof Oliver Kappe, within the Center for Continuous Flow Synthesis and Processing (CCFlow) at the Research Center for Pharmaceutical Engineering (RCPE). RCPE is a non-profit spin-out company from the universities in Graz. Chris’s main research focus is on the development of multiphase flow processes, particularly in the use of gases for organic synthesis. He also is interested in the development of automated flow systems and process integration. Chris works closely with industrial partners Lonza, AstraZeneca, Janssen and Process Systems Enterprise for the transfer of processes from the laboratory scale to commercial implementation.

Predictive models for assessing reaction conditions – Dr Timur Madzhidov (Kazan Federal University)
One of the key challenges in the development of a synthesis strategy is the selection of optimal reaction conditions that provide the necessary regio-, stereoselectivity, together with a high yield of the target reagent. In our studies, we focused on hydrogenation reactions which are widely used in synthetic chemistry (e.g., reduction of different groups, hydrogenolysis, coupled with other processes, etc). The reaction yield varies as a function of catalyst, solvent, pressure, and temperature. In order to tackle this problem, we developed several tools based on the Condensed Graph of Reaction approach [1] implemented in the CGRtools library [2]. Dataset of 400K hydrogenation reactions from Reaxys database was used. Special workflow for reaction and condition curation was developed and implemented [3]. The expert system automatically extracting knowledge on reactivity of certain chemical groups under given conditions from a database was developed. The approach was compared to manually assembled Green’s Reactivity Charts describing protective groups reactivity. It was found that many annotations were not supported by existing data. We show how one can extract more detailed information on protective group reactivity using the developed tool. The system helps synthetic chemist to rationally but manually find optimal conditions for particular reactions. Prediction of optimal conditions using machine learning is a difficult task which is complicated by the absence of negative results, a possibility that the reaction can be carried out under several conditions, false negatives uncertainty (if predicted condition does not coincide with experimental one it does not generally mean that the prediction is wrong). Two neural network models were developed. The first one (model 1) was used to rank reaction conditions according to their applicability to particular transformation. External validation shows that experimental condition is found in top-10 predicted conditions in 84% of cases, which is significantly better than the null model and the nearest neighbor method. The approach was validated experimentally. We demonstrated that even for concurrent reactions condition leading to the desired selectivity was correctly predicted in 83% cases. The model 1 can’t be applied in cases when too many different conditions are associated with one reaction. To treat this problem, a model based on Conditional Variational Autoencoder of special architecture was developed. On the external test set it performs similarly to model 1.

Research was supported by the Russian Science Foundation grant 19-73-10137.

[1] Varnek, A. et al. J. Comput. Aided. Mol. Des. 2005, 19, 693. [2] R.I. Nugmanov, et al. J. Chem. Inf. Mod. 2019, 59, 2516. [3] Lin, A. et al. J. Chem. Inf. Mod. 2016, 56, 2140.

Bio: Senior researcher and assistant professor in A.M. Butlerov Institute of Chemistry of Kazan Federal University. PhD in Organic Chemistry. Head of chemoinformatics group in Chemoinfomatics and Molecular Modeling Lab in Kazan Federal University. Creator and vice-director of Master Program in Chemoinformatics and Molecular Modeling of Kazan Federal University. The scope of research interests includes reaction informatics, adaptation of machine learning for chemical tasks, databasing and big chemistry data processing, development of novel algorithms for chemoinformatics.

Encoding solvents and product outcomes to improve reaction prediction systems – Dr Ella M Gale (University of Bristol)
Recently, machine learning (a.k.a. artificial intelligence) algorithms have demonstrated success at retrosynthesis and lead generation, however these algorithms are rarely tested in the real world. Synthetic outcomes are hugely dependent on the conditions used, like the choice of solvent or temperature chosen, however most retrosynthetic algorithms do not encode this information into the input data. The output data is usually encoded in a binary way, as to whether a particular desired output chemical is present or not, with some workers proposing that if a product is naively possible (i.e. from the structure) but not the main output, it is a negative example (these are useful for training generative neural networks). We propose expanding the complexity of both input and output data to make these algorithms more practically useful. The solvents are input with a 1-hot code referring to a database of around 600 popular and commercially relevant solvents, and temperature is input as being within ranges. The reaction output products are coded in a trinary way: the value 2 if the product is present and either the top-most reported product or in a high enough yield to be synthetically useful, 1 if the product is present but in a low yield, and 0 if the product is naively possible (i.e. from the structure) but not seen in the lab. This method allows us to use negative examples, but does not conflate those which are merely low yield under one of conditions with those that are not seen due to the chemistry. The encoding of conditions allows us to guide chemists as to which conditions will be good for their syntheses. The technology-enhanced chemical synthesis DTC recently set up at the University of Bristol has access to a ChemSpeed automatic synthesis robot and design of experiments (DoE) software for optimization of reactions. The output from our expanded retro-synthesis algorithms can be used as guides to the inputs to DoE programs, allowing for the fast optimization of reaction conditions (and as we know the physico-chemical properties of the solvents in the database, we can use this information to search for greener and cheaper alternatives to test). In the future, we intend to use this set-up to train our students in industry-standard techniques and test different retrosynthesis algorithms against each other and human experts under fair real-world conditions (i.e. after a small amount of DoE optimization).

Bio: Ella Gale is currently the Machine Learning subject specialist attached to the Technology Enhanced Chemical Synthesis CDT in the School of Chemistry at the University of Bristol. Her current responsibilities include training the CDT students in machine learning, data science, statistics and design of experiments, providing data science and machine learning support to the chemistry department generally and researching machine learning techniques for retrosynthesis and de novo drug design. Dr Gale has has over ten years of experience working across artificial intelligence, computer science and chemistry.