This event was the third of the AI3SD Autumn Seminar Series that was run from October 2021 to December 2021. This seminar was hosted online via a zoom webinar and the theme for this seminar was Data Science 4 Chemistry, and consisted of two talks on the subject. Below is the videos of the talks and speaker biographies. The full playlist of this seminar can be found here.
Statistics Are a Girl’s best Friend: Expanding the mechanistic Study Toolbox with Data Science – Dr Anat Milo
Anat Milo received her BSc/BA in Chemistry and Humanities from the Hebrew University of Jerusalem in 2001, her MSc from UPMC Paris in 2004 with Berhold Hasenknopf, and her PhD from the Weizmann Institute of Science in 2011 with Ronny Neumann. Her postdoctoral studies at the University of Utah with Matthew Sigman focused on developing physical organic descriptors and data analysis approaches for chemical reactions. At the end of 2015 she returned to Israel to join the Department of Chemistry at Ben-Gurion University of the Negev, where her research group develops experimental, statistical, and computational strategies for identifying molecular design principles in catalysis with a particular focus on stabilizing and intercepting reactive intermediates by second sphere interactions.
Data management: at the root of high-throughput experimentation – Dr Nessa Carson
Nessa Carson was born in Warrington, England. She received her MChem degree from Oxford University, before completing postgraduate studies in catalysis and organic methodology at the University of Illinois at Urbana-Champaign. She started in industry as a synthetic chemist for AMRI, then moved within the company to run the high-throughput automation facility on behalf of Eli Lilly in Windlesham, working across both the discovery and process chemistry arenas. She then worked in process development using automation at Pfizer. Nessa started at Syngenta in 2020, working in automation, reaction optimization, and data management. She maintains a website of useful chemistry resources, https://supersciencegrl.co.uk.
Q & A
Q1: Are all your synthesis reactions done at room temperature and atmospheric pressure? If so, does this limit what you can produce?
They are absolutely not, and certainly not at room temperature, I tend to run my reactions in aluminium blocks rather than the plastic plates that you’ll see biologists using, so that really helps. Cooling is actually surprisingly difficult sometimes, but you certainly can cool aluminium blocks as well as heat. As for pressure, you have to have specialist kits to run high throughput at high pressure; it’s doable but I think there are still challenges because most of the time it’s literally a box – a box of gas that goes around your plates and it does have that extra concern. If you’re running a truly high throughput like 96-well you do have that extra concern of potential like solvent overspill between different reactions. Both of these things are very, very doable.
Q2: What do you use to destroy the chemicals after you are done with an experiment?
When it comes to quenching reactions, honestly, I don’t really do workups like aqueous workups unless I have to. Most of the time I will add a solvent to every vial in a plate, sample from that, and inject directly into the LC-MS just because it’s so much faster. I will only not do that if there’s a very good reason not to. The advantage of very small scale is that hazards are often mitigated, so you pretty much can quench things in the same way as you would in a lab, but quickly with a multi-channel pipette. If you need to add bleach or water or whatever it is, you have that advantage of being able to quench very quickly simply because of the small scale.
Q3: In each pie chart on Slide 12, what do the four items (colours) in the pie chart indicate and what does the size of the circle indicate?
Each colour refers to a different component in the reaction mixture. So, green is good essentially: here, green is the desired product. In general, I like to stay consistent so our colleagues who look at this would immediately understand that a row full of green is good because I always make green good. These charts are for are relative amounts in the LC-MS analysis, in this case, the UV peak area at a certain wavelength. The size of the circle is pretty much a reaction profile, so if there are a lot of impurities not accounted for, the circle is small. So, a small pie chart is a messy reaction; a large pie chart is a clean reaction, basically.
Q4: What is a “self-optimising reaction”?
Having asked that question, I now realise maybe I should call it a “self-optimising experiment”, maybe “self-optimising reaction” is wrong. Self-optimising experiments – I will make an effort to say that from now on. I suppose I mean something that automation can optimize essentially by itself or partly by itself. So, you would probably feed it a parameter space to start with. In fact, in all cases at the moment in the literature, even if they don’t explicitly state this, like Bayesian optimisation: they will always provide a relatively small parameter space to start with. But something that automation can essentially just run, like it might start off exploring chemical space broadly, and then the software would generate a statistical model with an objective to maximize what appears as green in these pie charts and keep sampling, working towards that, probably with some kind of machine learning.
Q5: Is all of your HTE data stored in a database and is that data FAIR?
I would like it to be FAIR. I push very hard for things to be as FAIR as possible when it comes to data. I go on about it a lot and I think people are probably sick of me going on about it. It’s incredibly important that that we have that FAIR storage model, at least within the company. And it’s definitely getting a lot better; I always try to ensure that at least my own data are FAIR. As for storing in a database, then this is evolving for me at least. It’s definitely getting better and better over time. Many people define our understanding of what kind of data standards we want and how they will work, not just for chemists but everybody from IT to biology to other people who might have need for this later on: formulation, etc. So, yes, it is stored somewhat sensibly right now, but I think we should always be making improvements on this.
JF: I really like the way you’ve integrated the data management, the actual running, the experiments, the people, the issues around it for a real lab working here. Lots of questions I’d like to ask, but let’s just focus on that. Your high throughput essentially is parallel because you’ve got the wellplates and then you repeat it. Obviously, there’s a serial element to this as well, especially when you’re running, say, different conditions. Have you looked at any of the designs, like the sequential updating of the Design of Experiments, so that as you’ve got some data coming in from some parts of your matrix, you decide which things to do next?
That that would be nice but basically the answer is not right now for me. You see so many impressive things in the literature, particularly around Bayesian optimisation and that kind of thing. I think there will be a lot more space in industry for this kind of computer-guided self-optimisation in the future, although I also like generating a large amount of data very quickly, so I believe 96-well plates are here to stay too.
Q6: When you put a seed in a small well with dirt, after you are done, how do you destroy the chemicals?
Good question – I honestly don’t know! It’s of course important to not let chemicals that are not fully tested for environmental safety to be released so I would guess treated plants are dealt with similarly to toxic chemicals.