Computer-Assisted Organic Synthesis 9780841203945, 9780841204621, 0-8412-0394-6

Content: LHASA : logic and heuristics applied to synthetic analysis / David A. Pensak and E.J. Corey -- Computer program

458 63 3MB

English Pages 242 Year 1977

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computer-Assisted Organic Synthesis
 9780841203945, 9780841204621, 0-8412-0394-6

Citation preview

Computer-Assisted Organic Synthesis W. Todd Wipke, EDITOR

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001

University of California, Santa Cruz W. Jeffrey Howe, EDITOR The Upjohn Company

A symposium cosponsored by the Division of Chemical Information and the Division of Computers in Chemistry at the Centennial Meeting of the American Chemical Society, New York, N.Y., April 7-8, 1976.

61

ACS SYMPOSIUM SERIES

AMERICAN

CHEMICAL

SOCIETY

WASHINGTON, D. C. 1977

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001

Library of CongressCIPData Computer-assisted organic synthesis. (ACS symposium series; 61 ISSN 0097-6156) Includes bibliographical references and index. 1. Chemistry, Organic—Synthesis—Data processing— Congresses. I. Wipke, W. Todd, 1940. II. Howe, William Jeffrey, 1946. III. American Chemical Society. D i vision of Chemical Information. IV. American Chemical Society. Division of Computers in Chemistry. V . American Chemical Society. VI. Series: American Chemical Society. ACS symposium series; 61. QD262.C54 ISBN 0-8412-0394-6

547'.2'0285 ACSMC8

77-13629 61 1-239

Copyright © 1977 American Chemical Society All Rights Reserved. N o part of this book may be reproduced or transmitted in any form or by any means—graphic, electronic, including photocopying, recording, taping, or information storage and retrieval systems—without written permission from the American Chemical Society.

PRINTED IN THE UNITED STATES OF AMERICA

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

ACS Symposium Series

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001

Robert F. Gould, Editor

Advisory Board Donald G. Crosby Jeremiah P. Freeman E. Desmond Goddard Robert A. Hofstader John L. Margrave Nina I. McClelland John B. Pfeiffer Joseph V. Rodricks Alan C. Sartorelli Raymond B. Seymour Roy L. Whistler Aaron Wold

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.fw001

FOREWORD The ACS S Y M P O S I U M SERIES was founded in 1974 to provide a medium for publishing symposia quickly in book form. The format of the SERIES parallels that of the continuing A D V A N C E S I N C H E M I S T R Y SERIES except that in order to save time the papers are not typeset but are reproduced as they are submitted by the authors in camera-ready form. As a further means of saving time, the papers are not edited or reviewed except by the symposium chairman, who becomes editor of the book. Papers published in the ACS S Y M P O S I U M SERIES are original contributions not published elsewhere in whole or major part and include reports of research as well as reviews since symposia may embrace botfe types of presentation.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.pr001

PREFACE Hphe application of digital computers to the design and study of organic syntheses has stimulated the interest of chemists and computer scientists alike. The field not only promises practical benefits in synthesis planning and chemical education but also contributes to many areas of theoretical interest such as synthetic strategies, reaction indexing, structure representation, substructural perception, computer graphics, molecular modelling, and artificial intelligence. Many concepts and techniques now being used in structure elucidation, structure-activity, and other chemical information systems werefirstdeveloped in the computer synthesis planning field. Since 1969 when thefirstpaper in this area appeared, several groups have pursued approaches to computer synthesis planning. The symposium from which this volume derives was the first meeting of all the major research groups in this field. The papers in this volume describe the state of the art of computer synthesis as viewed by the major research groups working in the area. The editors acknowledge the Petroleum Research Fund for partial support of the symposium through a travel grant to Ivar Ugi and thank Cynthia O'Donohue for her help as program chairman. W. TODD WIPKE W. JEFFREY HOWE

Santa Cruz, August 1977

CA

Kalamazoo, MI

vii In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1 LHASA—Logic and Heuristics Applied to Synthetic Analysis DAVID A. PENSAK

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

Central Research and Develop. Dept., Ε. I. du Pont de Nemours and Co., Wilmington, Del. 19898 E. J. COREY Dept. of Chemistry, Harvard University, Cambridge, Mass. 02138

Despite the wealth o f knowledge about various chemical r e a c t i o n s , there e x i s t s no formal framework of i n t e r r e l a t i o n s h i p s t o guide the chemistint h e synthesis o f even moderately complex molecules. The LHASA (Logic and H e u r i s t i c s Applied t o Synthetic A n a l y s i s ) p r o j e c t is an attempt t o c o d i f y and organize the techniques used in organic synthesis. One important aspect o f the p r o j e c t has been t h e w r i t i n g o f a general purpose computer program which will a i d the laboratory chemist and will employ both the b a s i c and more complex techniques f o r s y n t h e t i c design as e l u c i d a t e d by this study. The program (hereafter also c a l l e d LHASA) is intended t o propose a v a r i e t y o f s y n t h e t i c routes t o whatever molecule it is given. The responsibility f o r final e v a l u a t i o n o f the merit o f the routes lies with the chemist. The program is t o be an adjunct t o the laboratory chemist as much as any analytical t o o l . Since LHASA is incapable o f proposing any routes the chemist could not have thought o f by himself, i . e . , it does not suggest new reactions that have never been tried, there needs t o be some justification for the massive e f f o r t involved in writing the program. It is well known that humanity and creativity b r i n g w i t h them c e r t a i n unavoidable shortsightedness and p r e j u ­ d i c e s . There will be particular reactions with which 1 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

each chemist i s most f a m i l i a r and i t i s to these that he w i l l f i r s t look to f i n d h i s s y n t h e t i c r o u t e ( s ) . I t i s p r e c i s e l y t h i s f a i l u r e to consider a l l p o s s i b l e routes t h a t makes a program l i k e LHASA both u s e f u l and necessary. Computers are w e l l known f o r t h e i r a b i l i t y to perform rote tasks a great number of times without complaint. Examination of p o t e n t i a l s y n t h e t i c path­ ways may be broken down i n t o s u f f i c i e n t l y small steps as to be amenable to computer implementation. How should one go about designing a synthesis? One of the most b a s i c techniques i s to work back­ wards, the f i n a l product or t a r g e t molecule being the u l t i m a t e g o a l . This t a r g e t i s examined to f i n d any or a l l compounds which can be transformed i n t o i t i n a s i n g l e chemical step. Each of these precursors may then be s i m i l a r l y analyzed u n t i l s a t i s f a c t o r y s t a r t i n g m a t e r i a l s are obtained. This method of a n a l y s i s i s c a l l e d r e t r o s y n t h e t i c (or, e q u i v a l e n t l y , a n t i t h e t i c ) . A s t r u c t u r a l m o d i f i c a t i o n that i s being performed i n the r e t r o s y n t h e t i c d i r e c t i o n i s c a l l e d a transform and i s g r a p h i c a l l y depicted as a double arrow.

CH

3

When applied i n i t s most general form, r e t r o ­ s y n t h e t i c a n a l y s i s could be applied to every pre­ cursor of the t a r g e t molecule and then, i n t u r n , to each of the new s t r u c t u r e s . The g r a p h i c a l represen­ t a t i o n of such an a n a l y s i s i s c a l l e d a synthesis t r e e . (A complex example i s shown below). I t i s worthwhile to note that s t r u c t u r e s tend to be l e s s complex the f u r t h e r away they are from the t a r g e t molecule and no c o n s t r a i n t s are placed on the choice of r e a c t i o n s . The s t a r t i n g m a t e r i a l s are the l a s t to be generated, thereby maintaining f l e x i b i l i t y i n the choice of route u n t i l the end of the a n a l y s i s .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

LHASA

3

WRITE-THRU

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

MAKE

KILL

SAVE

RESTORE

TYPE NODE

-LINEAGE

-FAMILY

GET NODE

-LINEAGE

-FAMILY

SCOPE1

PROCESS

RESTART

MENU2

-ALL

Of considerable importance t o the way a synthesis i s analyzed i s the d e t a i l e d plan o f execution. The ordering o f the a d d i t i o n (or removal) o f i n d i v i d u a l f u n c t i o n a l i t i e s , the manipulation o f stereocenters, and the c l o s u r e o f rings can be o f c r u c i a l importance i n terms o f i n t e r f e r e n c e s o r competing r e a c t i o n s . The procedures f o r choosing the sequence i n which these d i s c r e t e steps are performed are c a l l e d s t r a t e g i e s . At present there are three broad categories o f s y n t h e t i c strategy that LHASA i s capable o f employing. They are 1) Opportunistic o r F u n c t i o n a l Group Based Strategies 2) S t r a t e g i c Bond Disconnections f o r P o l y c y c l i c Targets 3) S t r a t e g i e s Based on S t r u c t u r a l Features

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

a) Appendages b) Rings (small, common, medium-large) c) Masked F u n c t i o n a l i t y The aim o f the LHASA p r o j e c t has been the c r e a t i o n o f a computer program which employs the s t r a t e g i e s gleaned from the study o f s y n t h e t i c design. Such a program now e x i s t s though i t i s c o n t i n u a l l y undergoing m o d i f i c a t i o n and expansion as new s t r a t e ­ gies are e l u c i d a t e d and implemented. The remainder o f t h i s paper describes the o v e r a l l o r g a n i z a t i o n o f the LHASA program and the implementation o f these s t r a t e ­ g i e s . P a r t i c u l a r a t t e n t i o n i s paid t o those aspects of LHASA which are o f s p e c i a l i n t e r e s t t o s y n t h e t i c chemists o r t o computer s c i e n t i s t s working i n chemical areas. ORGANIZATION OF LHASA The LHASA program i s exceedingly complex - about 400 subroutines, 30,000 l i n e s o f FORTRAN code and a data base o f over 600 common chemical r e a c t i o n s . To describe i t i n d e t a i l i s w e l l beyond the scope o f t h i s paper. A general overview i s relevant as i t puts the f u n c t i o n s o f the data base i n a reasonable perspective. Figure 1 shows a g l o b a l view o f LHASA. 1

•See f o r example Corey, E. J . , W. J . Howe and D. A. Pensak, J . Amer. Chem. S o c , 2& 7724 (1974). Corey, E. J . , Quart. Rev. Chem. S o c , 25, 455 ( 1 9 7 1 ) · Corey, E. J . , W. T. Wipke, R. D. Cramer I I I and W. J . Howe, J . Amer. Chem. S o c , £ 4 , 421 (1972). Corey, E. J . , W. L. Jorgensen, J . Amer. Chem. S o c , 28,

189 (1976).

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

5

LHASA

PERCEPTION

GRAPHICS

\

EXECUTIVE

CHEMISTRY PACKAGES

STRUCTURE

TRANSFORM EVALUATION

MANIPULATION

CHEMISTRY DATA BASE

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

Figure 1 Sample recognition questions

GRAPHICS There i s no doubt that the part o f LHASA which makes the most immediate impact on chemists i s the graphics. He may draw i n the s t r u c t u r e that he i s i n t e r e s t e d i n analyzing using standard chemical con­ ventions and a l l communication from the program t o him i s v i a s t r u c t u r a l diagrams. This manner o f i n t e r f a c ­ i n g w i t h the chemist-user was chosen because i t has been shown t h a t the rate a t which he can a s s i m i l a t e chemical data i s maximized i f i t i s i n the n o t a t i o n w i t h which he i s most f a m i l i a r . As the chemist s i t s f a c i n g the CRT (cathode ray tube) with which he communicates w i t h LHASA, on h i s r i g h t i s a d i g i t i z i n g data t a b l e t . This i s a device which measures the two dimensional coordinates o f t h e s t y l u s o r pen which i s used f o r i n p u t t i n g s t r u c t u r e s . As the s t y l u s i s moved along the surface o f the t a b l e t , LHASA i s t o l d by the t a b l e t where the s t y l u s touched the surface and where i t had moved t o when i t was l i f t e d . A l i n e i s displayed on the CRT between these two p o i n t s e x a c t l y l i k e a bond being drawn on a sheet o f paper. The LHASA graphics routines recognize t h a t two atoms are required t o make t h i s bond and makes appropriate i n t e r n a l e n t r i e s i n the program data base. Conforming t o standard s t r u c t u r e conventions, an atom

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

6

COMPUTER-ASSISTED ORGANIC SYNTHESIS

i s carbon unless otherwise i n d i c a t e d and s u f f i c i e n t hydrogens are assumed to f i l l out valence* In short, LHASA graphics l e t the chemist enter the s t r u c t u r e exactly as he would draw i t on a sheet o f paper.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

When a m u l t i p l e bond i s d e s i r e d , i t i s necessary only t o t r a c e over the s i n g l e bond one (or two) a d d i t i o n a l times. LHASA responds by redrawing the bond as double (or t r i p l e ) . I n d i c a t i n g stereochemistry i s e s s e n t i a l l y the same, the appropriate i n d i c a t o r (wedged o r dotted) i s chosen and the d e s i r e d bond i s traced w i t h the s t y l u s (see below). STORE

END

REPLAY

SCAN

SC0PE2

PROCESS

ORAM

MOVE

DELETE

WIPE

STEREO

MENU2

H UP

O

N

C

OOMN

P

S

LEFT

F

C

RIGHT

L

B SHALL

R

I

X

-

SINGLE-CRT



ic

ATOMNO

A

Ε BONONO

Considerable e f f o r t was expended t o insure that no a r t i s t i c t a l e n t i s required i n s t r u c t u r a l i n p u t . A reasonable amount o f inaccuracy i s permitted i n p o i n t ­ ing t o an atom o r bond. LHASA determines what was intended and acts a c c o r d i n g l y . There i s no need t o worry about consistency o r p r e c i s i o n o f bonds o r angles. With the one exception o f i n d i c a t i n g c i s trans isomerism around double bonds, i t makes no d i f f e r e n c e how d i s t o r t e d the s t r u c t u r e i s i n p u t . LHASA works s o l e l y from information about connec­ t i v i t y . This f l e x i b i l i t y i s q u i t e u s e f u l f o r drawing

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

7

LHASA

s t r u c t u r e s i n a p a r t i c u l a r conformation. The program w i l l process i t c o r r e c t l y and d i s p l a y a l l o f f s p r i n g i n the same conformation.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

The only other times t h a t the chemist must com­ municate w i t h LHASA are when d e c i s i o n s are to be made about which s t r u c t u r e i s to be analyzed and what method i s to be sed. I n a l l cases a t a b l e o f choices (or menu) i s displayed and the chemist merely p o i n t s to the one (or ones) that he wishes. A sample menu i s shown below. SINGLE GROUP

GROUP PAIR

DEBUG

FULL SEARCH

FULL SEARCH

TREE

BOND MODE

EXIT

BOND MODE

NARROW MODE SUBGOALS MANUAL MODE

KEY SUBSTRUCTURES

SEQUENTIAL FGI DIELS-ALDER ISOLATED STRAT BOND

APPENDAGE CHEMISTRY ROBINSON k+2 RING APPNDG ONLY

DISCONNECTIVE

ROBINSON X-3 BRANCH APPNDG ONLY

RECONNECTIVE UNMASKING

SMALL RINGS PERCEPTION ONLY

STEREOSPECIFIC C=C

At no time i s he forced to l e a r n any s p e c i a l command formats or memorize l i s t s o f o p t i o n s . The LHASA graphics and c o n t r o l s t r u c t u r e s were s p e c i f i c a l l y designed to be as easy and n a t u r a l f o r the chemist t o use as p o s s i b l e . PERCEPTION Inherent i n a s t r u c t u r a l diagram i s a wealth o f information - r i n g s , f u n c t i o n a l groups, stereo­ chemistry, e t c . To be as e f f e c t i v e as p o s s i b l e , LHASA must recognize a l l o f these and u t i l i z e them i n i t s planning processes. The procedures by which these are c a l l e d p e r c e p t i o n . By v i r t u e of having very s o p h i s t i c a t e d perception r o u t i n e s , LHASA can avoid f o r c i n g the chemist t o input

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8

COMPUTER-ASSISTED ORGANIC SYNTHESIS

any a r t i f i c i a l (to him) i n f o r m a t i o n . An unexpected adjunct o f t h i s has been a guarantee of perceptual completeness. For example, consider the f o l l o w i n g s t r u c t u r e (the non-indole p o r t i o n o f the a l k a l o i d ajmaline).

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

HO

There are three six-membered r i n g s , one five-membered r i n g and one seven-membered r i n g yet few chemists perceived a l l of them. I f the s t r u c t u r e were redrawn as HO

a d i f f e r e n t , though s t i l l incomplete set of r i n g s i s recognized by the human. The point of t h i s example i s that LHASA must perceive r i n g s s o l e l y on the b a s i s of c o n n e c t i v i t y , not how the s t r u c t u r e i s drawn. The program would be useless i f i t missed syntheses based on c y c l i c substructures because the chemist had f a i l e d to i n d i c a t e a l l the r i n g s i n the molecule. RING PERCEPTION Many researchers have attacked the problem of f i n d i n g the set of c y c l e s i n a graph. T h e i r work 2

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

LHASA

9

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

has p r i m a r i l y been d i r e c t e d towards i d e n t i f y i n g the smallest set rings o f rings i n the network. For chem­ i c a l purposes, t h i s i s not s u f f i c i e n t however. F o r example, the s t r u c t u r e below

i s best synthesized by the D i e l s Alder a d d i t i o n shown, but the six-membered r i n g formed i s not part of the minimal c y c l i c b a s i s o f the molecule. I t i s necessary, t h e r e f o r e , t o redefine our problem as that of f i n d i n g the set o f cycles i n a graph which are o f chemical s i g n i f i c a n c e . For s y n t h e t i c purposes, rings must be s p l i t i n t o two classes - r e a l and pseudo. For each bond i n a molecule, the smallest r i n g c o n t a i n i n g that bond i s c a l l e d a r e a l r i n g . I t i s q u i t e p o s s i b l e that the number o f r e a l rings w i l l be greater than the c y c l i c order of the molecule (/ bonds - / atoms + 1). Pseudo r i n g s are the pairwise envelopes o f r e a l rings with the r e s t r i c t i o n that the s i z e o f the envelope be seven or l e s s . Real rings are u s e f u l because the chemistry of a bond i s best r e f l e c t e d by the s i z e o f the s m a l l ­ est r i n g c o n t a i n i n g i t - f o r example, the f u s i o n bond i n the s t r u c t u r e below.

2 See f o r example Paton, K., Commun. Assoc. Comput. Mach., 12, 514 (1969).

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Pseudo rings are u s e f u l because they are the r i n g s which are o f t e n formed i n the c o n s t r u c t i o n o f bridged molecules. STRATEGIC BONDS - CYCLIC

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

There are u s u a l l y c e r t a i n bonds i n a molecule whose disconnection i n the r e t r o s y n t h e t i c d i r e c t i o n leads t o a s i g n i f i c a n t s i m p l i f i c a t i o n o f the c y c l i c s t r u c t u r e . These are termed s t r a t e g i c bonds. Since these have been described i n d e t a i l elsewhere, we s h a l l consider them here only b r i e f l y . ^ The f i r s t premise o f s t r a t e g i c bonds i s that the chemical a c t i v i t y o f a bond i s a d i r e c t f u n c t i o n o f the s i z e o f the smallest r i n g c o n t a i n i n g i t . This leads t o the requirement that a s t r a t e g i c bond must be i n a r i n g o f f i v e , s i x , o r seven members and not i n or exo t o a c y c l o p r o p y l r i n g . A s t r a t e g i c bond must also be i n the r i n g ( i f any) with the maximum number of bridges on i t . This insures t h a t i t s disconnection w i l l s i m p l i f y the c y c l i c network as much as p o s s i b l e . The s t r u c t u r e below shows the power o f t h i s h e u r i s t i c . 0

HO

5 Corey, E. J . , W. J . Howe, H. W. Orf, D. A. Pensak and G. A. Petersson, J . Amer. Chem. Soc, 2L 6ll6 (1975).

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

11

LHASA

Other r e s t r i c t i o n s on s t r a t e g i c bonds prevent them from being aromatic and from l e a v i n g c h i r a l side chains. These requirements are also based on current chemical technique. FUNCTIONAL GROUP PERCEPTION

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

F u n c t i o n a l i t y i n molecules i s w e l l defined. There i s no argument about whether a group i s a ketone o r not. The problem i n LHASA i s t o perceive a l l groups and j u x t a p o s i t i o n s o f groups which can be chemically meaningful. To t h i s end, a context dependent grammar has been developed t o unambiguously represent the p h y s i c a l domain o f a group and the s i t e ( s ) at which i t can be expected t o r e a c t . This grammar defines which atoms i n the group are considered o r i g i n s . Each f u n c t i o n a l group i s c h a r a c t e r i z e d by at l e a s t one carbon atom which i s i t s attachment t o the r e s t o f the molecule. This i s c a l l e d an o r i g i n atom and i t i s around these o r i g i n s t h a t many o f the data t a b l e s i n LHASA are organized. As an example, an alpha-beta unsaturated ketone has two o l e f i n i c p o s i t i o n s w i t h s i g n i f i c a n t l y d i f f e r e n t a f f i n i t y t o e l e c t r o p h i l i c reagents. To consider the double bond as one group w i t h constant r e a c t i v i t y i s an unreasonable s i m p l i f i c a t i o n , but r e c o g n i z i n g i t as two o r i g i n atoms each w i t h o l e f i n i c character makes s u i t a b l e d i f f e r e n t i a t i o n p o s s i b l e . To accomplish t h i s r e c o g n i t i o n e f f i c i e n t l y , a t a b l e d r i v e n approach was chosen since the types o f groups t o be recognized change from time t o time, depending on the needs o f those using the program ( s i x t y - f o u r d i f f e r e n t groups are c u r r e n t l y recognized by LHASA, see Table 1). This c o n s i s t s o f a s e r i e s o f questions which can be answered w i t h e i t h e r a y e s o r "no" · Depending on the answer, the type o f the next question t o be asked i s s p e c i f i c a l l y determined. F o r n

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

,!

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

Table 1 KETONE ALDEHYDE ACID ESTER AMIDE*1 AMIDE*2 AMIDE*? ISOCYANATE ACID*HALIDE THIOESTER AMINE*} AZIRIDINE AMINE*2 AMINE*1 NITROSO DIAZO HALOAMINE HYDRAZONE OXIME IMINE THIOCYANATE ISOCYANIDE NITRILE AZO HYDROXYLAMINE NITRO AMINEOXIDE THIOL EPISULFIDE SULFIDE SULFOXIDE

SULFONE C*SULFONATE LACTAM PHOSPHINE PHOSPHONATE EPOXIDE ETHER PEROXIDE ALCOHOL NITRITE 0*SULFONATE FLUORIDE CHLORIDE BROMIDE IODIDE DIHALIDE TRIHALIDE ACETYLENE OLEFIN HYDRATE HEMIKETAL KETAL HEMIACETAL ACETAL AZIDE DISULFIDE ALLENE LACTONE VINYLW VINYLD ESTERX AMIDZ

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

LHASA

example, suppose a carbon-nitrogen t r i p l e bond has been found the questions would look, l i k e C=N

?

yes >

isocyanide

yes

-C ? Ino

thiocyanate

yes

c-s-•CsNf

?

no v

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

nitrile

The a c t u a l data t a b l e which d r i v e s t h i s r e c o g n i t i o n process i s reproduced below. A22 A25 A24 A25 A26 A27 A28 A29

AJO A31 A32 A3 3 A34 A35 A3 6 A37 A38 A39

A40 A4l A42 A43

A44 A45 A46 A47

A48 A49

A50 A51 A52 A53

A54 A55

A56 A57 A58

A59 A60

A6l A62 A63 A64 A65

LOC A 2 3 NULL LOC A 2 5 NULL LOC A 2 7 LOC A 2 5 LOC A29 LOC A30 LOC A31 NULL LOC A33 LOC A34 LOC A35 LOC A36 NULL LOC A38 LOC A 3 9 LOC A40 LOC A4l NULL NULL LOC A44 NULL LOC A46 LOC A 4 7 LOC A48 NULL LOC A50 LOC A51 LOC A52 LOC A 5 3 LOC A54 LOC A55 LOC A56 NULL LOC A 5 8 LOC A59 LOC A 6 0 LOC A6l LOC A62 LOC A63 NULL LOC A65 NULL

LOC A24 NULL LOC A 2 6 NULL LOC A 2 8 LOC A 2 8 LOC A32 NULL LOC A31 NULL LOC A33 LOC A43 LOC Α35· LOC A37 NULL NULL LOC A 3 9 LOC A40 LOC A42 NULL NULL LOC A 4 5 NULL LOC A66 LOC A 4 7 LOC A 4 9 NULL LOC A 5 7 NULL LOC A52 LOC A 5 3 LOC A56 LOC A55 LOC A56 NULL NULL LOC A59 LOC A 6 0 LOC A64 LOC A62 LOC A63 NULL LOC A63 NULL

SHIFT + IF CARBON*COUNT IS TWO IDENTIFIED AS KETONE IF HYDROGEN*COUNT IS TWO IDENTIFIED AS ALDEHYDE IF HYDROGEN*COUNT IS ONE IF CARBON*COUNT IS ONE SEARCH FOR C**N SHIFT + SEARCH FOR C*N NONORIGIN ENTRY IDENTIFIED AS ISOCYANATE ENTRY BOND*SHARED SEARCH FOR C*0 BOND*SHARED SHIFT + IF HYDROGEN*COUNT IS ONE IDENTIFIED AS ACID SEARCH FOR C*0 SHIFT + NONORIGIN BOND*SHARED SHIFT + IF IN RING OF ANY SIZE IDENTIFIED AS LACTONE IDENTIFIED AS ESTER SEARCH FOR C*X IDENTIFIED AS ACID*HALIDE SEARCH FOR C*N BOND*SHARED SHIFT + IF HYDROGEN*COUNT IS TWO IDENTIFIED AS AM3DE*1 SHIFT + IF IN RING OF ANY SIZE SHIFT + SEARCH FOR C*N SHIFT + NONORIGIN SHIFT + BOND*SHARED SEARCH FOR C*N SHIFT + NONORIGIN BOND*SHARED IDENTIFIED AS LACTAM IDENTIFIED AS LACTAM BOND*SHARED SHIFT + NONORIGIN SHIFT + SEARCH FOR C*N BOND*SHARED SHIFT + NONORIGIN IDENTIFIED AS AMIDE*3 IF HYDROGEN*COUNT IS ONE IDENTIFIED AS AMIDE*2

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

14

COMPUTER-ASSISTED ORGANIC SYNTHESIS

FUNCTIONAL GROUP REACTIVITY

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

Frequently, during the experimental r e a l i z a t i o n of a s y n t h e t i c plan, c e r t a i n f u n c t i o n a l groups w i l l i n t e r f e r e w i t h the performance o f desired r e a c t i o n s . When t h i s happens, i t becomes necessary t o protect the offending group ( r e v e r s i b l y modify i t t o some other f u n c t i o n a l i t y that i s s t a b l e t o t h e r e a c t i o n c o n d i t i o n s ) . The extension o f computer a s s i s t e d synthetic analysis to sophisticated levels necessi­ t a t e s the d e t e c t i o n o f p o s s i b l e i n t e r f e r e n c e s . Such s i t u a t i o n s must be presented t o t h e chemist i n a g e n e r a l l y u s e f u l manner. This problem has been attacked i n LHASA by the separation o f f u n c t i o n a l group i n t o d i f f e r e n t c l a s s e s based on t h e i r e l e c t r o n i c and s t e r i c environment. At the same time a l i b r a r y o f standard reagents (cur­ r e n t l y 60) has been prepared c o n t a i n i n g the s t a b i l i t y of each o f the c l a s s e s o f f u n c t i o n a l groups t o each reagent. By t h i s mechanism the program can decide whether groups o f an i d e n t i c a l o r s i m i l a r type w i l l i n t e r f e r e w i t h the transform. F o r example, i n the s t r u c t u r e below i t i s p o s s i b l e t o s e l e c t i v e l y hydrogenate bond A i n the presence o f Β

From a computational point o f view, f u n c t i o n a l group r e a c t i v i t y i s s t r a i g h t f o r w a r d . Associated w i t h each group o r i g i n i s a number which unambiguously defines the environment o f the o r i g i n . These i n c l u d e s t e r i c hindrance and a c c e s s i b i l i t y , s t r a i n , and e l e c t r o n i c environment. I t i s important t o note that a group can be i n s e v e r a l subclasses simultaneously, a l l o f which must be encoded i n t o the one number. This number i s used t o assign r e a c t i v i t y l e v e l s t o each o r i g i n r e l a t i v e t o each reagent.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

15

LHASA

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

At the time of attempted r e a c t i o n execution, the program reads the d e s i r e d conditions from the data base. The r e a c t i v i t y l e v e l s of a l l n o n - p a r t i c i p a t i n g groups are examined. I f a l l are l e s s r e a c t i v e than the p a r t i c i p a t i n g group(s), then the transform i s allowed to proceed. I f t h i s i s not the case, the offending group(s) i s examined to see i f i t i s g e n e r a l l y p r o t e c t a b l e . I f i t i s then a s o l i d r e c ­ tangle i s drawn around the group as the transform i s displayed to the chemist. Unstable, unprotectable groups are g r a p h i c a l l y i n d i c a t e d by a dashed box around t h e i r bonds.

LHASA does not t r y to assign s p e c i f i c p r o t e c t i n g groups. There i s j u s t so much chemical d e t a i l that would have to be programmed that the i n t e r a c t i v e aspect of LHASA would be severely degraded. An addi­ t i o n a l problem e n t a i l s evaluating when i n the syn­ t h e t i c route i t would be best to protect and then deprotect the group(s). The program c u r r e n t l y deals w i t h the synthesis t r e e on a node by node b a s i s . A g l o b a l o p t i m i z a t i o n of the i n d i v i d u a l steps i n the t r e e i s one a d d i t i o n a l l e v e l of s o p h i s t i c a t i o n which has not yet been attempted. APPENDAGE BASED STRATEGIES The vast majority of m u l t i s t e p syntheses i n v o l v e e i t h e r the disconnection, reconnection, or m o d i f i c a ­ t i o n of what are l o o s e l y c a l l e d 'appendages . One p a r t i c u l a r l y u s e f u l r e t r o s y n t h e t i c strategy c o n s i s t s of fragmenting a r i n g and then disconnecting the r e s u l t i n g appendages, as shown below. 1

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

16

COMPUTER-ASSISTED ORGANIC SYNTHESIS

.0

S i m i l a r l y , s t r a t e g i e s i n v o l v i n g reconnection of appendages are e x c e p t i o n a l l y u s e f u l i n s t e r e o s p e c i f i c s y n t h e s i s . These reconnections have a l s o proven valuable i n the synthesis of medium s i z e r i n g s .

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

?T

5

OH I t i s important to note t h a t a l l stereochemistry i n these examples i s perceived by LHASA and used i n i t s strategies. There are two c l a s s e s o f appendages - r i n g appendages and branch appendages. A r i n g appendage i s a group o f atoms attached t o r i n g t h a t i s not i n a r i n g i t s e l f . A branch appendage may only o r i g i n a t e on a non-ring atom and must have three o r more attachments other than hydrogens. Non-terminal o l e f i n s and acetylenes are a l s o considered as o r i g i n s o f branch appendages f o r chemical reasons. Signifi­ cant i n the use of appendages i s the c o m b i n a t o r i a l problem o f determining i d e n t i c a l i t y o f appendages. This has been solved q u i t e e l e g a n t l y by Jorgensen. Appendage based s t r a t e g i e s may be d i v i d e d i n t o disconnective and reconnect!ve. The l a t t e r may be f u r t h e r p a r t i t i o n e d i n t o r i n g appendage - r i n g Corey, E. J . , W. L. Jorgensen, J . Amer. Chem. S o c , 28, 189 (1976).

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

LHASA

17

appendage, r i n g appendage-ring, and a c y c l i c recon­ n e c t i o n s . LHASA c u r r e n t l y knows about twenty d i f f e r e n t c l a s s e s of reconnective transforms - a small subset of the group p a i r chemistry data base (vide i n f r a ) . When i n a mode where these transforms are s p e c i f i c a l l y being executed, they are empowered to make s e v e r a l small s t r u c t u r a l m o d i f i c a t i o n s to achieve the d e s i r e d reattachment. THE CHEMISTRY PACKAGES OF LHASA

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

F u n c t i o n a l Group Based Transforms As we have already seen, LHASA has a wide v a r i e t y of s t r a t e g i e s which i t can employ, e i t h e r of i t s own v o l i t i o n or by d i r e c t i v e of the chemist-user. In order t o f a c i l i t a t e the use of these s t r a t e g i e s , the chemical data base i n LHASA i s broken down i n t o s e v e r a l separate c a t e g o r i e s , two group transforms, one group transforms, f u n c t i o n a l group interchange, f u n c t i o n a l group a d d i t i o n and r i n g o r i e n t e d t r a n s ­ forms. This s e c t i o n w i l l describe each of these and g i v e b r i e f examples of t h e i r use. Two group transforms are keyed s p e c i f i c a l l y by two f u n c t i o n a l groups w i t h a path of predetermined l e n g t h between them. Examples of these are shown below.

One group transforms are s i m i l a r but are keyed by one s p e c i f i c group w i t h an a s s o c i a t e d path (not as

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

18

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

chemically meaningful as above). below.

Examples are shown

The world of chemistry would indeed be rosy i f there were always p r e c i s e matches against t h i s data base. Unfortunately, t h i s i s not o f t e n the case. Frequently, one of the group ( i n a two group s i t u a ­ t i o n ) does match, but the other one does not. I f the i n c o r r e c t group could r e a d i l y be converted i n t o a matching group, then the transform would become acceptable. As i n the A l d o l Condensation, i f the molecular fragment present were the ether, the match would not be found, yet the e t h e r i f i c a t i o n of the a l c o h o l can o f t e n be q u i t e s t r a i g h t f o r w a r d . I f the

«0^00 °^o^OO ^o*00 performance of the A l d o l i s considered a chemical ' g o a l , then the conversion of the ether to the a l c o h o l i s a 'subgoal . In t h i s case, the subgoal c o n s i s t e d of modifying a group or F u n c t i o n a l Group Interchange (FGI). 1

1

A more complicated case would e x i s t i f the second group necessary to key the transform was t o t a l l y absent. In the example below, the only f u n c t i o n a l substructure capable of keying a t r a n s ­ form would be the o l e f i n . To perform the A l d o l

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK AND COREY

19

LHASA

CO-XO-,00 transform i t would be necessary to add ( i n the r e t r o s y n t h e t i c d i r e c t i o n ) the a l c o h o l group. Such subgoals are c a l l e d F u n c t i o n a l Group A d d i t i o n (FGA). Obviously one does not want to always introduce a l l p o s s i b l e f u n c t i o n a l groups at a l l a v a i l a b l e p o s i t i o n s o r do i n d i s c r i m i n a n t group conversions without some guiding purpose or s t r a t e g y . As such, FGI's and FGA s are only executed i n response to a request from a higher l e v e l chemistry package.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

1

Subgoal requests can be combined and mixed according to the s i t u a t i o n . Next to be added i s FGI then INTRO since not a l l groups may be INTRO ed. 1

Ring Oriented Transforms I t was recognized t h a t of great s i g n i f i c a n c e to LHASA type analyses was the i n c l u s i o n of chemistry packages whose sole purpose was the c o n s t r u c t i o n of r i n g s . These transforms could not and should not be keyed by the presence or absence of any p a r t i c u l a r f u n c t i o n a l i t y . Since they had s p e c i f i c long range goals, they were given considerable power i n the type and number of subgoals that they could request. This i s i n c o n t r a s t to the two group or one group chem­ i s t r i e s where only one FGI or FGA could be performed before the f i n a l d i s c o n n e c t i o n . Four r i n g forming transforms have been con­ sidered at length by the LHASA development group - the D i e l s A l d e r a d d i t i o n , the Robinson Annelation, the Simmons-Smith r e a c t i o n , and i o d o - l a c t o n i z a t i o n . The f i r s t three of these have been f u l l y implemented i n LHASA and the f o u r t h i s completely flow charted and awaits only coding i n t o the chemistry data base language.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

20

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

A l l r i n g chemistry t a b l e s are organized i n t o what i s c a l l e d b i n a r y search t r e e s . Queries are posed about the existence o f c e r t a i n s t r u c t u r a l f e a t u r e s . Each o f these questions i s answerable w i t h a yes o r a no. Based on the answer one o f two d i f f e r e n t f o l l o w up questions i s s e l e c t e d . Embedded w i t h i n the t a b l e may be requests f o r subgoals, e i t h e r those already i n the FGI o r FGA t a b l e o r f o r s p e c i a l r e a c t i o n s which are needed only f o r these transforms and are not o f general s y n t h e t i c i n t e r e s t . The f i r s t step i n implementation o f a r i n g transform i s the preparation o f a chemical flow c h a r t . This defines a l l the questions about the s t r u c t u r e and describes i n a graphic r e p r e s e n t a t i o n the s y n t h e t i c steps t h a t w i l l be taken. I t i s q u i t e s t r a i g h t f o r w a r d f o r a chemist having no f a m i l i a r i t y w i t h LHASA t o read and make use o f these c h a r t s . A number o f graduate students and p o s t d o c t o r a l f e l l o w s i n the Corey group at,Harvard U n i v e r s i t y made s i g n i f i c a n t input t o the chemistry i n the t a b l e s without ever having t o worry about the computer implementation. The example below shows some o f the s y n t h e t i c routes generated by the D i e l s A l d e r transform f o r the i n d i c a t e d precursor. I t i s important t o note that while some o f the chemistry may look somewhat naive, i t can be q u i t e thought provoking.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

PENSAK A N D COREY

LHASA

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

1.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

21

22

COMPUTER-ASSISTED ORGANIC SYNTHESIS

I t i s c l e a r that designing a synthesis o f a r i n g with so many stereocenters presents a formidable challenge f o r most s y n t h e t i c chemists. I t i s f a i r t o say that the r i n g transforms are a g e n e r a l i z a t i o n o f the concepts derived f o r the group o r i e n t e d c h e m i s t r i e s . Work i s c u r r e n t l y underway t o g e n e r a l i z e t h i s s t i l l f u r t h e r , t o permit generation o f a r b i t r a r i l y complex molecular patterns, always s p e c i f i a b l e i n a n o t a t i o n e a s i l y readable by the chemist.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

CHMTRN - CHEMICAL DATA BASE LANGUAGE The chemical transforms are the heart and s o u l o f LHASA. Without good chemistry i n the data base, a l l the s o p h i s t i c a t e d perception would be e s s e n t i a l l y use­ l e s s . The f i r s t requirement i n the design o f the data base was that i t be m o d i f i a b l e without having t o recompile any other part o f the program. The second requirement was that i t require no knowledge o f FOKIRAN or how LHASA i s organized on a subroutine by sub­ r o u t i n e b a s i s . The t h i r d requirement was that the data base be e a s i l y readable by chemists with no t r a i n i n g i n LHASA and m o d i f i a b l e a f t e r only a l i t t l e i n t r o d u c t i o n t o the language. To meet these c o n d i t i o n s a s p e c i a l chemical pro­ gramming language CHMTRN (Chemical T r a n s l a t o r ) ^ was developed. By use o f a s p e c i a l assembler - TBLTRN (Table T r a n s l a t o r , w r i t t e n by Dr. Donald E. B a r t h ) , i t was p o s s i b l e t o convert the CHMTRN t a b l e s i n t o s p e c i a l l y encoded FORTRAN BLOCK DATA statements which could be loaded with LHASA o r read i n at run time. The b a s i c approach o f CHMTRN i s that there are keywords ( c u r r e n t l y s e v e r a l hundred) that have

I f t h i s name c o n f l i c t s o r d u p l i c a t e s that o f some other chemical program, I apologize. The d u p l i c a ­ tion i s unintentional.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

P E N S A K AND

COREY

LHASA

23

s p e c i f i c numerical values assigned to them. A l l of these keywords that are typed on a s i n g l e l i n e i n the data base are l o g i c a l l y 'or' ed together i n t o one computer work. (The t r a n s l a t i o n and combination are handled by TBLTRN). Each such l i n e i s c a l l e d a q u a l i f i e r as i t l i m i t s or modifies the scope of the transform.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

LHASA contains an i n t e r p r e t e r c a l l e d EVLTRN (Evaluate Transform) which decodes the b i t patterns and performs the requested queries about the current s t r u c t u r e or performs a s p e c i f i e d operation. As an example, consider a l i n e from the t a b l e s which says SUBTRACT 20 FOR EACH PRIMARY HALIDE ALPHA TO CARB0N*1 OFFPATH ONRING. This q u a l i f i e r decrements the base r a t i n g of the transform f o r each primary h a l i d e that i s on a c y c l i c carbon which i s not a part of the path keying the transform. From t h i s example, i t i s p o s s i b l e to see how densely the chemical data i s packed (one q u a l i f i e r takes up only one computer word - 32 b i t s ) . There i s a t a r g e t to be searched f o r (the h a l i d e ) , a domain or l o c a t i o n to which the search i s r e s t r i c t e d (alpha to the carbon but on a r i n g and o f f the path), and an i t e r a t i o n command i n d i c a t i n g that the operation (the s u b t r a c t i o n ) i s to be performed f o r each occurrence. CHMTRN has s e v e r a l other c o n s t r u c t i o n s worthy of mention. The f i r s t i s the a b i l i t y to make m o d i f i c a ­ t i o n s to the s t r u c t u r e according to r e s u l t s of q u a l i f i e r e v a l u a t i o n s . One can say, f o r example, ATTACH AN ALCOHOL TO CARB0N*2 CIS TO CARB0N*4 This command also shows j u s t one case where s t e r e o ­ chemical considerations can be i n c l u d e d . Complete block s t r u c t u r i n g (as i n P L / l or ALGOL) has been incorporated. This i s u s e f u l where a com­ plex s e r i e s of queries should be a p p l i e d i n c e r t a i n

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

24

COMPUTER-ASSISTED ORGANIC

SYNTHESIS

repeatable circumstances, f o r example, FOR EACH KETONE ANYWHERE DO BEGIN

END.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

A l l q u a l i f i e r s between the BEGIN and END are executed f o r each KETONE. These may be nested t o any d e s i r e d depth. S i m i l a r l y IF-THEN-ELSE c o n s t r u c t i o n s are a l s o allowed. Software subroutines have a l s o been implemented i n CHMTRN/EVXTRN. Suppose there i s one group o f q u a l i f i e r s which needs o f t e n be applied t o d i f f e r e n t l o c a t i o n s i n the molecule at varying times. This can be handled by the c o n s t r u c t i o n CALL FGIW AT CARB0N*3 AND B0ND*2 In the subroutine FGIW these arguments are addressable as SPECIFIED*ATOM and SPECIFIED*BOND. I t i s p e r m i s s i ­ b l e t o apply stereochemical c o n s t r a i n t s t o arguments at the time o f execution o f the CALL. There i s no p r a c t i c a l l i m i t on the depth o f subroutine c a l l s . Subroutines may a l s o r e t u r n a value t o i n d i c a t e whether o r not they succeeded i n the task they were assigned t o do. The Robinson Annelation transform has received d e t a i l e d examination by the LHASA group. One sub­ r o u t i n e i n the t a b l e s p e c i f i c a l l y checks t o see i f there i s any f u n c t i o n a l i t y alpha t o the ketone i n the cyclohexane and i f there i s , remove i t by exchanging i t f o r something non-offensive. This subroutine i s reproduced below as an example o f the CHMTRN language.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

P E N S A K A N D COREY

LHASA

25

...THIS SUBROUTINE IS CALLED TO CLEAR AWAY ANY UNDESIRABLE FUNCTIONALITY ...ALPHA TO A KETONE ON THE RING ALPHCHK IF NO HYDROGEN ON THE SPECIFIED ATOM THEN GO TO 19 IF THERE IS NOT A WITHDRAWING GROUP ON THE SPECIFIED ATOM THEN GO TO l8 IF THE SPECIFIED ATOM IS THE SAME AS CARB0N*2 THEN RETURN SUCCESS IF BOND*5 IS A FUSION*BOND THEN RETURN SUCCESS IF THERE IS NOT A NITRO ON THE SPECIFIED ATOM THEN GO TO l8 EXCHANGE THE GROUP FOR AN AMINE IF SUCCESSFUL THEN GO TO J2 OTHERWISE RETURN FAIL 18 IF THERE IS A HALIDE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A KETONE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A WITHDRAWING GROUP ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A DONATING GROUP ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS AN OLEFIN ALPHA TO THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A FUNCTIONAL GROUP ON THE SPECIFIED ATOM THEN RETURN FAIL IF THERE IS NOT A FUNCTIONAL GROUP ALPHA TO THE SPECIFIED ATOM THEN RETURN SUCCESS EXCHANGE THE GROUP FOR A WITHDRAWING GROUP IF SUCCESSFUL THEN GO TO J2 OTHERWISE RETURN FAIL

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

19

J2

IF THE SPECIFIED ATOM IS A QUATERNARY^ENTER THEN RETURN FAIL IF THERE IS AN ALCOHOL ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS A HALIDE ON THE SPECIFIED ATOM THEN GO TO J2 IF THERE IS NOT AN ETHER ON THE SPECIFIED ATOM THEN RETURN FAIL CALL DET THE GROUP AT THE SPECIFIED ATOM AND GO TO RET

Two examples o f how these constructs are a p p l i e d together w i l l demonstrate t h e i r u t i l i t y and f l e x i ­ b i l i t y . A number o f r e a c t i o n s , such as Michael a d d i t i o n depend on the conformation o f the i n t e r ­ mediate enolate f o r t h e i r s p e c i f i c i t y . I t i s p o s s i b l e to make i n i t i a l queries about the s t r u c t u r e , generate the enolate, ask about i t , then generate the f i n a l precursor and ask questions about i t . At each stage of t h i s process, i t i s p o s s i b l e t o detect a f a t a l c o n d i t i o n and terminate e v a l u a t i o n o f t h e transform. This same language i s being used t o s u c c e s s f u l l y c a l c u l a t e p r e f e r r e d conformations o f cyclohexanes f o r e v a l u a t i o n o f r e g i o s p e c i f i c i t y and i n f u n c t i o n a l group reactivity analysis. ORGANIZATION OF THE DATA BASE The one group and two group and the subgoal t a b l e s are queried very f r e q u e n t l y during a t y p i c a l a n a l y s i s s e s s i o n . A data s t r u c t u r e has been developed which i s extremely e f f i c i e n t f o r these searches - the r e t r i e v a l time being independent o f e i t h e r the s i z e o f the data t a b l e o r the number o f s u c c e s s f u l h i t s i n the t a b l e . Because o f the general a p p l i c a b i l i t y o f t h i s

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

26

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

technique, we s h a l l describe i t i n more d e t a i l - using the two group t a b l e s as an example*

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

At present there are s i x t y - f o u r d i f f e r e n t func­ t i o n a l groups which are capable o f keying a transform i n some manner. This can e i t h e r be as the f i r s t key­ ing group o r the second group ( f o r example, both a ketone and an o l e f i n key the A l d o l Condensation)

The keying mechanism must point t o the a p p l i c a b l e transform regardless o f the o r d e r i n g o f the groups and i t must a l s o handle s i t u a t i o n s where the keying groups are the same. The f i r s t element i n the r e p r e s e n t a t i o n i s a ' d i r e c t o r y s e t - a Boolean set w i t h a b i t on f o r each group t h a t can p a r t i c i p a t e i n any transform a t the d e s i r e d path l e n g t h . I f the group i s not marked i n t h i s set, then there i s no need t o f u r t h e r i n t e r r o ­ gate the t a b l e - there w i l l not be any acceptable entries· 1

For each group t h a t does p a r t i c i p a t e i n t r a n s ­ forms, there are two a d d i t i o n a l multi-word s e t s . A b i t i s on f o r each transform i n which the group takes part - i n the f i r s t set i f i t i s the f i r s t keying group and i n the second i f i t i s the second. L o g i c a l l y 'AND'ing the f i r s t group ε f i r s t set w i t h the second group's second set y i e l d s a set w i t h b i t s on f o r only these transforms. These b i t p o s i t i o n s are used as indexes i n t o a t a b l e o f addresses o f the q u a l i f i e r s for those a p p l i c a b l e transforms. While i t sounds r a t h e r complicated, i t r e a l l y i s not. What has been done i s t o generate an e x t e r n a l addressing s t r u c t u r e at assembly time. 1

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

PENSAK A N D COREY

27

LHASA 2 word*



Address o f Path 0 D i r e c t o r y Address o f Transform Address Table Count o f S p e c i a l Sets f o r t h i s Table Address o f Path 1 D i r e c t o r y

Group P a r t i c i p a t i o n S e t Symmetrical Trans fori Re connective Transforms Special s t r a t e g i c Directory

Subgoal Trans forms Simplifying

Sets

Trans fori

Disconnective Transforms 1st group as GROUP»1 1st group as GROUP*2

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

2nd group as GROUP*1 2nd group as GROUP*2 Group-transform sets

f

Transform Address Table

y

F i r s t Transform's

Qualifiers

X \y

Second Transform's

Qualifiers

Q u a l i f i eQrusa l i f i e r s TRemaining h i r d Transform's Group P a r t i c i p a t i o n S e t

The f i g u r e above shows the o v e r a l l s t r u c t u r e of the t a b l e . I t should be noted that i f you wish t o r e s ­ t r i c t your search t o , f o r example, those transforms which break carbon-carbon bonds a l l that i s necessary i s t o define, at assembly time, a set t o i n d i c a t e t h i s c h a r a c t e r i s t i c and i n d i c a t e which transforms are a p p l i c a b l e . At run time, AND ing t h i s set with otherwise allowed transforms applies the r e s t r i c t i o n i n p a r a l l e l . This technique o f generating an e x t e r n a l addressing s t r u c t u r e when coupled w i t h Boolean opera­ t i o n s i s a q u i t e powerful and u s e f u l technique. ,

,

HOW DO YOU ADD TO THE DATA BASE The c r i t i c i s m has o f t e n been l e v e l l e d at LHASA that i t takes considerable time t o add t o the data

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

28

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

base. The D i e l s Alder t a b l e , f o r example, took: almost s i x man-months t o prepare and debug. This s e c t i o n w i l l describe the process o f b u i l d i n g a s o p h i s t i c a t e d data t a b l e l i k e the D i e l s Alder o r the Robinson Annelation and demonstrate that the s o p h i s t i c a t i o n o f the r e s u l t s obtained i s a d i r e c t f u n c t i o n o f the exhaustiveness and s p e c i f i c i t y o f the t a b l e s . ( I t i s worthwhile t o point out, however, that there are o f t e n times when naive chemistry proposed by LHASA i n s i t u a ­ t i o n s where i t was not o r i g i n a l l y envisioned, has turned out t o be e x c e p t i o n a l l y i n t e r e s t i n g . ) A l l the r i n g transform packages i n LHASA employ binary search techniques. This means that a l l s t r u c ­ t u r a l questions are t o be answered w i t h a yes o r a no. Preparation o f the sequence o f questions r e l a t i n g t o s t r a i g h t f o r w a r d chemical s i t u a t i o n s poses no r e a l problems. I t i s the i d e n t i f i c a t i o n and r e s o l u t i o n o f the e x t r a o r d i n a r y cases t h a t are d i f f i c u l t . For example, i n a Robinson disconnection f o r t h e sequence below, the geminal dimethyl s u b s t i t u t i o n i s a f o r m i ­ dable problem.

0

I t i s up t o the chemist designing the t a b l e s t o f i r s t perceive that t h i s s i t u a t i o n might occur. Second, decide whether he wishes t o have the t a b l e s salvage the d i f f i c u l t s i t u a t i o n and i f he does, he has t o manually determine what kind o f chemistry should be attempted. The above example i s a c l e a r b l a c k o r white s i t u a t i o n . Unless the dimethyl s u b s t i t u e n t i s r e ­ moved, the transform j u s t cannot proceed. The grey areas cause j u s t as much o f a dilemma f o r the chemist. In M a r s h a l l ' s synthesis o f isonootkatone two p o s s i b l e stereoisomers could have r e s u l t e d .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

1.

PENSAK A N D COREY

29

LHASA

à P r i o r i p r e d i c t i o n o f the stereochemical course o f a r e a c t i o n , even knowing the three dimensional s t r u c t u r e of the reagents i s q u i t e d i f f i c u l t , i f not impossible. When preparing a s e c t i o n o f the t a b l e d e a l i n g w i t h such an ambiguity the chemist i s faced w i t h two a l t e r n a t i v e s , disregard stereochemistry e n t i r e l y (and make sure that LHASA does not imply that any stereochemistry i s being s p e c i f i e d ) o r go i n t o the l a b o r a t o r y and run an experiment. The r e c o g n i t i o n o f t h i s s i t u a t i o n can o f t e n get the t a b l e w r i t i n g chemist t h i n k i n g and has sometimes even suggested s p e c i f i c reactions that should be run. As we have been adding t o the data base at Du Pont (to the one and two group t a b l e s ) , the ques­ t i o n has o f t e n been r a i s e d how much d e t a i l should we go i n t o i n the q u a l i f i e r s ? " This i s somewhat o f a dilemma, many o f the i n d u s t r i a l reactions we are d e a l i n g with have only been considered f o r a l i m i t e d number o f substrates. I t i s not c l e a r whether q u a l i f i e r s should be incorporated t h a t r e s t r i c t the transforms t o only those cases where i t i s known t o work o r whether only those which are known t o f a i l should be s p e c i f i c a l l y excluded. Both ways take an immense amount o f l i t e r a t u r e work t o do c o n s i s t e n t l y . M

We a l l know that butadiene can be dimerized under c a t a l y t i c c o n d i t i o n s t o a wealth o f d i f f e r e n t products. A d d i t i o n o f a l k y l substituents changes the product mix and introduces a v a r i e t y o f d i f f e r e n t stereoisomers as w e l l . What happens i f we put f u n c t i o n a l groups on butadiene. Do a l l the reactions s t i l l proceed - do you get any new ones, etc.? We do not know and s e r i ­ ous doubt whether many experiments have ever been run

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

30

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

on such systems. The second a l t e r n a t i v e above has been chosen - q u a l i f i e r s are being used only t o exclude r e a c t i o n s known not t o work. As such, a l o t of naive chemistry comes out o f our v e r s i o n LHASA, some o f i t e x c e p t i o n a l l y poor but at l e a s t the analyses are reasonably comprehensive. In summary, adding simple r e a c t i o n s t o LHASA i s simple. I n c o r p o r a t i n g s o p h i s t i c a t e d r e a c t i o n s can be as complicated as you wish t o make i t . (Work i s c u r r e n t l y underway t o prepare a general package o f subgoal transforms which w i l l serve t o remove i n t e r ­ ferences - r e l i e v i n g the chemist from having t o work them out separately f o r each super transform.) CONCLUSION Why i s Du Pont i n t e r e s t e d i n LHASA? The program was c l e a r l y designed f o r c a r b o c y c l i c n a t u r a l products synthesis i n mind - an area i n which the Company has only l i m i t e d i n t e r e s t . - Our attempt t o add i n d u s t r i a l s y n t h e t i c chem­ i s t r y t o LHASA i s f o r c i n g us t o organize our thoughts along l i n e s heretofore not done. We are being required t o look a t our r e a c t i o n s i n terms o f t h e i r known and unknown g e n e r a l i t y . This i n and o f i t s e l f i s highly b e n e f i c i a l . - Our pharmaceutical and agrichemical chemists have been using the n a t u r a l products aspects o f LHASA to generate new ideas, o f t e n not o f i n d u s t r i a l syn­ t h e t i c merit but c e r t a i n l y o f i n t e r e s t when l o o k i n g f o r ways t o make d e r i v a t i v e s , e s p e c i a l l y commonality of routes. - L a s t l y , we have already seen t h a t some o f our i n d u s t r i a l knowledge i s t u r n i n g out t o be u s e f u l t o organic synthesis i n other areas. For example, very few chemists o u t s i d e o f those who are a c t u a l l y using i t d a i l y are aware t h a t , given s u i t a b l e c a t a l y s t s , butadiene can dimerize i n t o the compound below - a

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1.

P E N S A K A N D COREY

LHASA

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

q u i t e a t t r a c t i v e (and inexpensive) f o r prostaglandin s y n t h e s i s .

J.

31

starting material

Our hope i s that LHASA w i l l help us t o insure that we have considered a l l reasonable routes t o our major products. ACKNOWLEDGMENTS The LHASA p r o j e c t has been i n existence since 1968. During the years a number o f e x t r a o r d i n a t e l y able graduate students and post d o c t o r a l f e l l o w s came t o work w i t h Professor Corey, almost a l l laboratory s y n t h e t i c chemists. Whether i t i s the i n f e c t i o u s enthusiasm f o r the p r o j e c t o r j u s t an enjoyment o f using the computer as a research t o o l , not one alumnus of the LHASA p r o j e c t has abandoned h i s involvement w i t h computers and returned t o bench chemistry. They are (with current l o c a t i o n s ) W. D. W. R. W. G.

J. Howe - Upjohn E. Barth - Harvard Business School L. Jorgensen - Purdue D. Cramer - Smith K l i n e and French Todd Wipke - Santa Cruz A. Petersson - Wesleyan W. Vinson - Harvard H. W. Orf - Harvard

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

32

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch001

ABSTRACT Design o f complex organic syntheses is a task w e l l s u i t e d t o computer implementation. For a molecule o f moderate s i z e the number o f p o t e n t i a l s y n t h e t i c pathways is extremely l a r g e . Furthermore the number o f u s e f u l l a b o r a t o r y r e a c t i o n s is growing e x p l o s i v e l y . The LHASA program is a tool for syn­ t h e t i c chemists to aid in choosing the most reasonable routes t o any d e s i r e d molecule without exhaustive enumeration. The b a s i c s t r u c t u r e o f the program and the chemistry it employs are discussed g i v i n g s p e c i a l c o n s i d e r a t i o n t o the s t r a t e g i e s employed in s e l e c t i o n o f routes.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2 Computer Programs for the Deductive Solution of Chemical Problems on the Basis of a Mathematical Model of Chemistry

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

JOSEF BRANDT, JOSEF FRIEDRICH, JOHANN GASTEIGER, CLEMENS JOCHUM, WOLFGANG SCHUBERT, and IVAR UGI Organisch-Chemisches Institut, Technische Universität München, Arcisstr. 21, D-8 München-2, Germany

1.

Introduction

1.1

Computers and P r o g r e s s in Chemistry

In t h e p a s t decades t h e r e has been d r a m a t i c p r o g r e s s in c h e m i s t r y . The c o n c e p t u a l and theoretical approaches t o c h e m i s t r y have been decisively shaped by quantum mechanics. The r e s e a r c h t o p i c s and t e c h n i q u e s o f c h e m i s t r y have been changed in a p r o f o u n d manner by the advent o f modern s e p a r a t i o n and purification methods as w e l l as advances in structural d e t e r m i n a t i o n by t h e v a r i o u s types o f s p e c t r o s c o p y and X-ray crystallography. The use o f computers in c h e m i s t r y has a l s o p l a y e d an essential r o l e in this c o n t e x t . However, it is s a f e t o p r e d i c t t h a t t h e major c o n t r i b u t i o n s o f computers t o c h e m i s t r y a r e still t o be expected i n the future. Computers a r e a l r e a d y an important t o o l o f c h e m i s t r y . C o m p u t e r - a s s i s t e d documentation, the collection and e v a l u a t i o n o f e x p e r i m e n t a l d a t a , and quantum m e c h a n i c a l c a l c u l a t i o n s o f m o l e c u l a r p r o p e r t i e s predominate. The importance o f t h e solution o f c h e m i c a l problems such as the d e s i g n o f syntheses w i t h t h e aid o f computers is not y e t w i d e l y r e c o g n i z e d , but will have f a r - r e a c h i n g consequences and may w e l l l e a d t o r a t h e r fundamental changes in the activity o f c h e m i s t s . In

33 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

34

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

p a r t i c u l a r , the computer programs f o r the d e d u c t i v e s o l u t i o n o f c h e m i c a l p r o b l e m s on t h e b a s i s o f l o g i c a l s t r u c t u r e models and m a t h e m a t i c a l r e p r e s e n t a t i o n o f c h e m i s t r y have g r e a t p o t e n t i a l f o r the f u t u r e .

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

1.2

The S o l u t i o n o f C h e m i c a l P r o b l e m s on t h e o f T h e o r e t i c a l P h y s i c s and M a t h e m a t i c s

Basis

The m o l e c u l a r s y s t e m s , t h e s u b j e c t o f c h e m i s t r y , consist o f a r a t h e r s m a l l number o f b u i l d i n g b l o c k s , n a m e l y t h e c a . 100 c h e m i c a l e l e m e n t s w h i c h a r e c o m b i n e d a c c o r d i n g to r u l e s d e r i v e d from m a t h e m a t i c a l l y s t a t e d p h y s i c a l p r i n c i p l e s . The r e p r e s e n t a t i o n o f chemistry i n m a t h e m a t i c a l t e r m s seems t h e r e f o r e q u i t e n a t u r a l . T h i s has not yet been u t i l i z e d to the f u l l c o n c e i v a b l e extent. The e n e r g y h y p e r s u r f a c e d e s c r i b e s t h e e n e r g y o f c h e m i c a l s y s t e m s w i t h a g i v e n s e t o f atoms as a f u n c t i o n o f t h e s p a t i a l d i s t r i b u t i o n o f t h e a t o m i c n u c l e i . I f one c o u l d compute t h e c o m p l e t e e n e r g y h y p e r s u r f a c e f o r any s e t o f atoms and a n a l y s e t h e c o r r e s p o n d i n g d a t a i n a s u i t a b l e m a n n e r , one c o u l d p r e d i c t m o s t o f t h e relevant p r o p e r t i e s o f m o l e c u l a r systems w i t h the methods o f t h e o r e t i c a l p h y s i c s and m a t h e m a t i c s . T h i s , however, i s not f e a s i b l e i n f u t u r e , because the computational extremely l a r g e , even i n the case molecular systems.

the f o r e s e e a b l e e f f o r t w o u l d be of rather small

F o r t h e s o l u t i o n o f many c h e m i c a l p r o b l e m s t h e t h e o r e t i c a l t r e a t m e n t o f a few s e l e c t e d p o i n t s a n d t h e i r vicinity on t h e e n e r g y h y p e r s u r f a c e w o u l d s u f f i c e . Y e t , i n many c a s e s , o n e d o e s n o t know w h i c h p o i n t s a r e t h e r e l e v a n t o n e s . F o r c h e m i s t r y i t w o u l d be r a t h e r u s e f u l t o have a method f o r p r o v i d i n g a s u r v e y o f t h o s e p o i n t s and pathways on an e n e r g y h y p e r s u r f a c e which are e s s e n t i a l f o r the s o l u t i o n of a g i v e n problem, w i t h o u t t h e n e c e s s i t y o f a quantum m e c h a n i c a l t r e a t ment o f w i d e a r e a s o f a n e n e r g y h y p e r s u r f a c e . In t h e p r e s e n t p a p e r a s i m p l e m a t h e m a t i c a l model o f t h e c h e m i s t r y o f a g i v e n s e t o f atoms i s p r e s e n t e d w h i c h a f f o r d s p r e c i s e l y t h e l a t t e r , a n d may s e r v e a s a b a s i s o f computer programs f o r the d e d u c t i v e solution of a great v a r i e t y of chemical problems.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

2.

BRANDT

E T

AL.

Computer

Programs



The C h e m i s t r y

2.1

Chemical Equivalence

for Chemical

of a Fixed

Set o f

Problems

35

Atoms

Classes

Any p r o g r e s s i n c h e m i s t r y may b e i n t e r p r e t e d a s t h e r e c o g n i t i o n o f new e q u i v a l e n c e r e l a t i o n s a n d c l a s s e s . U n t i l r e c e n t l y , a u n i v e r s a l model o f c h e m i s t r y could n o t b e d e f i n e d , b e c a u s e we l a c k e d t h e c o n c e p t s f o r a mathematical treatment of the c o n s t i t u t i o n a l aspect of c h e m i s t r y . T h e r e f o r e i t i s n e c e s s a r y t o c o n s i d e r any c o n s t i t u t i o n a l l y d i f f e r e n t m o l e c u l a r systems as e q u i v a l e n t r e g a r d l e s s o f t h e number o f m o l e c u l e s w h i c h they c o n t a i n , i f , i n p r i n c i p l e they are i n t e r c o n v e r t i b l e through chemical r e a c t i o n s .

2.2

Isomeric

Ensembles

of Molecules

The s e t o f a l l m o l e c u l e s c a n be p a r t i t i o n e d into e q u i v a l e n c e c l a s s e s w h o s e members h a v e a l l t h e same molecular formula, i . e . the equivalence classes of isomers. Isomerism i s here the r e l e v a n t equivalence relation. Extending the concept o f isomerism from molecules t o e n s e m b l e s o f m o l e c u l e s (EM) l e a d s t o a m a t h e m a t i c a l m o d e l o f c o n s t i t u t i o n a l c h e m i s t r y ^ . A n EM i s d e s c r i b e d by i t s l i s t o f t h e m o l e c u l a r s p e c i e s i n t h e f o r m o f s u i t a b l e c h e m i c a l f o r m u l a s . F o r a n EM o n e c a n d e f i n e two t y p e s o f e m p i r i c a l e l e m e n t a r y f o r m u l a s , o n o n e hand t h e e n s e m b l e f o r m u l a , on t h e o t h e r hand a p a r t i t i o n e d e m p i r i c a l ensemble formula which c o n s i s t s o f the m o l e c u l a r formulas o f t h e m o l e c u l e s i n t h e EM. 1

L e t A b e a s e t o f a t o m s . T h e n a l l E M ( A ) , i . e . a l l EM w h i c h c a n b e made f r o m A , h a v e t h e same e n s e m b l e f o r m u l a . T h e p a r t i t i o n e d e m p i r i c a l f o r m u l a o f a n EM i s a p a r t i t i o n i n g { < A > , . . . < A >} o f A i n w h i c h e a c h i s t h e f o r m u l a o f a m o l e c u l e . Accordingly, a n E M ( A ; c o n s i s t s o f o n e o r more m o l e c u l e s , w h i c h a r e o b t a i n e d f r o m A , i f e a c h atom o f A i s u s e d e x a c t l y o n c e . t

S i n c e i s o m e r i c m o l e c u l e s may d i f f e r constitutionally and s t e r e o c h e m i c a l ^ , a p a r t i t i o n e d e n s e m b l e f o r m u l a g e n e r a l l y c o r r e s p o n d s t o more t h a n o n e E M ( A ) . T h e c o n s t i t u t i o n a l f o r m u l a o f a n EM(A) c o n t a i n s t h e c o n s t i t u t i o n a l formulas of a l l molecules i n that EM(A). An F I E M ( A ) , t h e f a m i l y o f i s o m e r i c e n s e m b l e s o f m o l e c u l e s o f an atom s e t A i s t h e c o l l e c t i o n o f a l l

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

36

E M ( A ) . An F I E M i s underlying set of

g i v e n by t h e atoms A .

empirical

ORGANIC SYNTHESIS

formula

of

the

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

A c h e m i c a l r e a c t i o n or a sequence of c h e m i c a l r e a c t i o n s , i s t h e t r a n s f o r m a t i o n o f a n FM i n t o a n i s o m e r i c E M . T h u s t h e w h o l e c h e m i s t r y o f a s e t o f atoms A i s g i v e n by t h e EM(A) and t h e i r i n t e r c o n v e r s i o n s w i t h i n t h e FIEM(A). An e n e r g y h y p e r s u r f a c e d e s c r i b e s a l l c h e m i c a l s y s t e m s w h i c h c o n t a i n a g i v e n s e t o f a t o m s . An F I E M ( A ) o f s t a b l e EM(A) c o r r e s p o n d s t o a f a m i l y o f e n e r g y m i n i m a on t h e e n e r g y h y p e r s u r f a c e o f t h e a t o m s e t A . S e q u e n c e s o f EM(A) w h i c h a r e c h e m i c a l l y i n t e r c o n v e r t e d correspond t o pathways on t h e e n e r g y h y p e r s u r f a c e o f A .

3.

BE-Matrices

In m o l e c u l a r systems t h e a t o m i c c o r e s a r e h e l d t o g e t h e r by v a l e n c e e l e c t r o n s w h i c h o c c u p y m o l e c u l a r o r b i t a l s i n v o l v i n g t w o , o r more a t o m s . A c o v a l e n t bond i s a p a i r o f v a l e n c e e l e c t r o n s i n a m o l e c u l a r o r b i t a l a b o u t two c o r e s . G e n e r a l l y , the chemical c o n s t i t u t i o n of a m o l e c u l a r system i s d e s ­ c r i b e d by i t s c o v a l e n t l y b o u n d p a i r s o f a t o m i c c o r e s . The d i s t r i b u t i o n o f t h o s e v a l e n c e e l e c t r o n s w h i c h a r e n o t c o n t a i n e d i n c o v a l e n t c h e m i c a l b o n d s c a n be i n c l u d e d i n the d e s c r i p t i o n of a chemical c o n s t i t u t i o n . A c h e m i c a l c o n s t i t u t i o n i s u s u a l l y r e p r e s e n t e d by a c o n s t i t u t i o n a l f o r m u l a i n which the c h e m i c a l element symbols c o r r e s p o n d to the r e s p e c t i v e atomic c o r e s , the c o v a l e n t b o n d s a r e i n d i c a t e d by l i n e s b e t w e e n t h e e l e m e n t s y m b o l s , a n d t h e f r e e v a l e n c e e l e c t r o n s by d o t s at the atomic symbols. The c h e m i c a l c o n s t i t u t i o n o f m o l e c u l e s h a s a l s o b e e n r e p r e s e n t e d by v a r i o u s t y p e s o f m a t r i c e s . F o r t h e present purpose the BE-matrices are p a r t i c u l a r l y suitable. An η χ η

BE-matrix

Β =

\

(13)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT

ET AL.

Computer

Programs

for Chemical

Problems

37

r e f e r s t o a m o l e c u l e o r EM w h i c h c o n t a i n s a n a t o m s e t A = { A ^ , . . . , A } w i t h η i n d e x e d a t o m s . T h e i - t h row a n d i - t h column of A.

of a BE-Matrix

An o f f - d i a g o n a l

entry

b..

belongs

to the i - t h

atom

i n t h e i - t h row a n d j - t h

-J-J

column

is

the formal

covalent

bond

order

between

the

atoms

A. and Α . . S i n c e t h i s i m p l i e s a l s o a bond from ι J A. to A . we h a v e b . . = b . . . T h u s B E - m a t r i c e s a r e J ι ^-J J ι symmetric. Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

3

T h e i - t h d i a g o n a l e n t r y o f a B E - m a t r i x i s t h e number f r e e valence e l e c t r o n s which belong to A^. T h e number a t o m A^ i s

of valence e l e c t r o n s which belong g i v e n b y t h e r o w - c o l u m n sum

to the

T h e c r o s s sum b . o v e r a d i a g o n a l e n t r y c o m p r i z e s e n t r i e s i n t h e i - t h row a n d c o l u m n s a n d i s e q u a l b.

=

2

b

i

- b

i

±

of

all to

.

T h e c r o s s sum b . i s t h e number o f v a l e n c e which occupy the v a l e n c e o r b i t a l s o f A^.

electrons

A t a b l e which c o n t a i n s f o r a l l ^ c h e m i c a l elements the a l l o w a b l e c o m b i n a t i o n s o f b^, b^, c o o r d i n a t i o n numbers and b o n d o r d e r s a f f o r d s a q u i c k c h e c k w h e t h e r a B E matrix represents a valence chemically stable molecular system. The

sum

S

=

£

b

i j

o v e r a l l e n t r i e s o f a B E - m a t r i x i s t h e t o t a l number v a l e n c e e l e c t r o n s i n t h e E M . F o r a l l EM o f a n F I E M t h i s number i s t h e s a m e .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

of

COMPUTER-ASSISTED

38

Unless the some r u l e , represented equivalent the atomic indices.

ORGANIC SYNTHESIS

i n d e x i n g o f t h e atoms i s f i x e d a c c o r d i n g t o a g i v e n chemical c o n s t i t u t i o n i s not only by one B E - m a t r i x , b u t t h e r e a r e up t o η ! B E - m a t r i c e s w h i c h d i f f e r by p e r m u t a t i o n s of i n d i c e s and t h e r e s p e c t i v e row/column

Any B E - m a t r i c e s Β a n d B w h i c h d i f f e r o n l y b y row/ c o l u m n p e r m u t a t i o n s d e s c r i b e t h e same E M . T h e t r a n s ­ f o r m a t i o n o f a n η χ η B E - m a t r i x Β by row/column p e r m u t a t i o n i n t o an e q u i v a l e n t B E - m a t r i x B c a n be a c h i e v e d by a n η χ η p e r m u t a t i o n m a t r i x Ρ a n d i t s i n v e r s e , P^, a c c o r d i n g t o f

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

f

B

t

_

p

t

.

. ρ

B

F o r v a r i o u s p u r p o s e s , s u c h a s d o c u m e n t a t i o n , one n e e d s an u n a m b i g u o u s c o r r e s p o n d e n c e b e t w e e n a n EM a n d i t s B E - m a t r i x . T h e r e f o r e , we h a v e d e v e l o p e d p r o c e d u r e s for t h e a s s i g n m e n t o f a t o m i c i n d i c e s w h i c h l e a d t o an canonical BE-matrix. Among t h e e q u i v a l e n t of several molecules

B E - m a t r i x o f a n EM w h i c h c o n s i s t s i>**«> > there exist BE-matrices

M

M

m

i n b l o c k f o r m , s u c h t h a t e a c h b l o c k r e p r e s e n t s one c o n t i g u o u s m o l e c u l e . The b l o c k f o r m o f B E - m a t r i c e s u s e f u l f o r t h e s e p a r a t i o n o f m o l e c u l e s o f EM w h o s e B E - m a t r i c e s have b e e n g e n e r a t e d by a c o m p u t e r .

4.

The R e p r e s e n t a t i o n R-Matrices

of

Electron

Relocation

is

by

A c h e m i c a l r e a c t i o n i s t h e c o n v e r s i o n o f a n EM i n t o i s o m e r i c EM by r e l o c a t i o n o f v a l e n c e e l e c t r o n s .

an

S i n c e t h e t o t a l number o f v a l e n c e e l e c t r o n s S d o e s n o t c h a n g e i n a c h e m i c a l r e a c t i o n , i t must be t h e same f o r t h e i n i t i a l EM(B) and t h e f i n a l EM(E) o f a c h e m i c a l r e a c t i o n . I t f o l l o w s t h a t t h e sum o f e n t r i e s o f B E m a t r i c e s must b e i n v a r i a n t u n d e r t h o s e B E - m a t r i x t r a n s f o r m a t i o n s Β -> Ε w h i c h r e p r e s e n t c h e m i c a l r e a c t i o n s EM(B) + EM(E). L e t Β and Ε be t h e B E - m a t r i x o f t h e s t a r t i n g EM(B) and t h e f i n a l E M ( E ) o f a c h e m i c a l r e a c t i o n . T h e n a n Rm a t r i x ( r e a c t i o n m a t r i x ) i s d e f i n e d by t h e transfor-

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

Computer

BRANDT E T A L .

The

sum o f

the

entries

Programs

for Chemical

r..

an R - m a t r i x

of

Problems

39

is

because

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

s

Σ

e..

=

Σ .. b

r

Σ b..

T h e m a t r i x R = Ε - B must b e s y m m e t r i c b e c a u s e m a t r i c e s Β a n d Ε a r e s y m m e t r i c by d e f i n i t i o n .

the

BE-

T h e m a t r i x R = -R i s t h e i n v e r s e o f R. T h e t r a n s f o r m a ­ t i o n Ε + R = Β r e p r e s e n t s t h e r e a c t i o n E M ( Ε ) -> E M ( Β ) , i . e . t h e r e t r o - r e a c t i o n o f EM(B) EM(E) . An R - t r a n s f o r m a t i o n Β + R = Ε r e p r e s e n t s a c h e m i c a l r e a c t i o n i f , and o n l y i f i t obeys t h e m a t h e m a t i c a l f i t t i n g condition e.. = b.. + r.. > 0 IJ IJ i J f o r a l l e n t r i e s b e c a u s e , by d e f i n i t i o n , a B E - m a t r i x rrust have n o n - n e g a t i v e e n t r i e s . F u r t h e r m o r e , the r e s u l t Ε o f an R - t r a n s f o r m a t i o n o f a B E - m a t r i x Β o f a s t a b l e EM(B) w i l l represent a s t a b l e EM(E), i f the e n t r i e s e . . of Ε have v a l u e s t h a t are p e r m i s s i b l e f o r atoms A . and A . ( " c h e m i c a l f i t t i n g " ) .

the

respective

A c c o r d i n g l y , the p o s i t i v e e n t r i e s b . . of a given BE m a t r i x Β may b e u s e d t o d e t e r m i n e the negative e n t r i e s of mathematically f i t t i n g R-matrices. The p o s i t i v e e n t r i e s must be s e l e c t e d t o y i e l d a n R - m a t r i x 1

with can of

. .

r

i j

= 0.

Moreover,

the

J

entries

of

an

be s e l e c t e d t o meet t h e v a l e n c e c h e m i c a l t h e c h e m i c a l e l e m e n t s i n Ε = Β + R.

R-matrix restrictions

T h u s i t i s p o s s i b l e t o f i n d f o r a g i v e n EM a l l o f i t s c h e m i c a l r e a c t i o n s a n d t h e i r p r o d u c t s by t h e t r a n s ­ f o r m a t i o n p r o p e r t i e s o f the B E - m a t r i c e s , and t h e v a l e n c e c h e m i c a l c o n s t r a i n t s o f t h e l a t t e r . The a p p l i ­ c a t i o n o f a l l m a t h e m a t i c a l l y and c h e m i c a l l y f i t t i n g R-matrices to a BE-matrix generates the BE-matrices o f the whole FIEM.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

40

COMPUTER-ASSISTED

4.2

R-Categories

The t r a n s f o r m a t i o n o f a B E - m a t r i x by corresponds to a chemical r e a c t i o n . The

off-diagonal

the

number

and

a negative

electrons entries Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

ORGANIC SYNTHESIS

the

r..

broken

atoms

bonds

entries

covalent

diagonal

= r..

-LJ

covalent i i numbers

of

negative

A^

are

entry lose.

the

J -L

Α.-Α.,

r..

bonds r^ The

= r..

how

positive of

the

positive

R-matrix

indicate

between

tells

numbers

and the

a fitting

A.

and

many

A.,

free

off-diagonal newly

made

diagonal

entries

to the i n c r e a s e s i n f r e e e l e c t r o n atoms Α . . ι G e n e r a l l y , a n R - m a t r i x d o e s n o t o n l y f i t o n e , b u t many B E - m a t r i c e s . A c c o r d i n g l y , an R - m a t r i x does not o n l y r e p r e s e n t an i n d i v i d u a l c h e m i c a l r e a c t i o n , b u t a whole c a t e g o r y o f r e a c t i o n s w h i c h h a v e i n common t h e e l e c t r o n r e l o c a t i o n p a t t e r n r e p r e s e n t e d by t h e R - m a t r i x ) . An R - c a t e g o r y i s an e q u i v a l e n c e c l a s s o f c h e m i c a l r e a c t i o n s w h i c h h a v e i n common t h e same e l e c t r o n r e l o c a t i o n p a t t e r n and c e r t a i n f e a t u r e s o f t h e p a r t i c i p a t i n g bond s y s t e m s . The row/column p e r m u t a t i o n e q u i v a l e n c e o f B E - m a t r i c e s i m p l i e s t h a t R - m a t r i c e s r e p r e s e n t t h e same r e a c t i o n s when t h e y a r e i n t e r c o n v e r t e d b y row/column r

c

o

r

r

e

s

P ° ^ at the n


R l - t y p e z> R 2 - t y p e 1)

Any

two

R-matrices

R-category

if

their

R

1

and R

2

belong

irreducible

to

forms

the

. . . etc .

same

R ° i and

R

0

2

d i f f e r o n l y by r o w / c o l u m n p e r m u t a t i o n s . 2) Any two r e a c t i o n s b e l o n g t o t h e same R A - t y p e , i f t h e y b e l o n g t o t h e same R - c a t e g o r y , a n d i f t h e i r n o n - z e r o e n t r i e s r e f e r t o t h e same c o m p o n e n t s o f a n a s s o c i a t e d v e c t o r of atoms. 3) Any two r e a c t i o n s o f t h e same R A - t y p e b e l o n g t o t h e same R B - t y p e , i f t h e y h a v e t h e same p o s i t i v e entries

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT

ET

Computer

AL.

Programs

for Chemical

Problems

45

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

i n the a f f e c t e d B E - m a t r i c e s , where t h e R - m a t r i c e s have n o n - z e r o entries. 4) Any two r e a c t i o n s o f t h e same R B - t y p e b e l o n g t o t h e same R l - t y p e , i f t h o s e e n t r i e s o f t h e B E - m a t r i c e s whose e n t r i e s a r e a f f e c t e d by t h e R - m a t r i x r e f e r t o t h e same c o m p o n e n t s o f t h e a t o m v e c t o r . 5) R 2 - , R 3 - j . . . t y p e s may b e d e f i n e d i n a n a l o g y t o t h e R l - t y p e s by f u r t h e r c l a s s i f y i n g a c c o r d i n g t o t h e second, t h i r d e t c . sphere of n e i g h b o r i n g atoms. T h i s h i e r a r c h i c c l a s s i f i c a t i o n o f c h e m i c a l r e a c t i o n s by t h e i r R- a n d B E - m a t r i c e s may n o t o n l y s e r v e a s a means o f f o r m a l o r d e r i n g o f r e a c t i o n s and as a b a s i s o f d o c u m e n t a t i o n s y s t e m s , b u t can a l s o s e r v e as a d e v i c e in the systematic c o m p u t e r - a s s i s t e d deductive search f o r new c h e m i c a l r e a c t i o n s , by a n a l g o r i t h m w h i c h f i n d s all o f t h e m a t h e m a t i c a l l y and c h e m i c a l l y f i t t i n g p a i r s ( Β , E) o f B E - m a t r i c e s f o r a r e p r e s e n t a t i o n R - m a t r i x o f an R - c a t e g o r y .

G e o m e t r i c and G r o u p - T h e o r e t i c a l t u t i o n a l Chemistry

5.

Aspects

of

Consti-

The g e o m e t r i c and g r o u p t h e o r e t i c a l a s p e c t s o f t h e B E and R - m a t r i c e s a r e i m p o r t a n t f o r t h e s o l u t i o n o f c h e m ­ i c a l problems. 1)

. in will

S i n c e these a s p e c t s have been d i s c u s s e d elsewhere d e t a i l , a b r i e f o u t l i n e of the e s s e n t i a l f e a t u r e s suffice here.

The B E - a n d t h e R - m a t r i c e s b e l o n g t o S ( n ) , t h e additive group of a l l η χ η symmetric m a t r i c e s w i t h i n t e g r a l e n t r i e s . L e t P . . be a n η χ η m a t r i x i n w h i c h a l l entries

are

zero,

except

a

"one"

in

the

υ ί

ϋ Ι ί

= 1,

S(n)

whose r a n k

(i,

j)-position.

Then {

is

P

ij

+

1

V' "^'

a basis

of

n(n

the -

η

}

group

1)

ρ

+ η = n(n

2

is

+ 1) 2

We d e n o t e

by

points

an e u c l i d e a n

of

..., η}

Z

s

the

group

whose

elements

s-dimensional

are

the

space R , s

lattice

the

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

46

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

group o p e r a t i o n b e i n g v e c t o r a d d i t i o n . Then, Z , i s a f r e e a b e l i a n group of rank s w i t h ( 1 , 0 . . . 0 ) , ( 0 , 1 , 0 . . . 0 ) ( 0 , . . . , 0 , 1 ) a s a b a s i s . Two f r e e a b e l i a n g r o u p s a r e i s o m o r p h i c i f , a n d o n l y i f , t h e y h a v e t h e same rank. Therefore, Z a n d S ( n ) a r e i s o m o r p h i c . The i m ­ b e d d i n g o f zs i n R i s a geometric i n t e r p r e t a t i o n of S(n). s

s

s

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

The B E - m a t r i c e s a r e c o n t a i n e d i n B ( n ) < S ( n ) , t h e s e t o f a l l η χ η symmetric m a t r i c e s w i t h n o n - n e g a t i v e integral e n t r i e s . The R - m a t r i c e s b e l o n g t o R ( n ) S ( n ) , t h e g r o u p o f a l l η χ η s y m m e t r i c i n t e g r a l m a t r i c e s whose sum o f entries is zero. In the mapping π : Β ( η ) R , π | B(n) | i s v i s u a l i z e d as a cone i n R w i t h t h e v e r t e x a t t h e o r i g i n , and π | R ( η ) | l i e s on a l i n e a r s u b s p a c e g o i n g t h r o u g h t h e o r i g i n and h a v i n g no o t h e r p o i n t i n common w i t h π | Β ( η ) | . s

s

The

mapping P:B(n) b

matrices dean The of

)}

2 1

...,

yields

an FIEM i n

R

n 2

b

l

n

;

b

2 1

,

t> , n l

an i m b e d d i n g o f ,

the

BE-

an n - d i m e n s i o n a l

eucli-

2

space. entries

a vector

nates of

Β of

-> { ( b

of

the

b..

of

Β c a n be

in

R

, or

a point

BE-matrix

n

P(B)

c o n s i d e r e d as t h e

also

in

R

n 2

as t h e

.

cartesian

We c a l l

P(B)

the

component? coordi­ BE-point

B. 2

n S i m i l a r l y , an R - m a t r i x R r e p r e s e n t s a v e c t o r i n R . A c h e m i c a l r e a c t i o n w h i c h i s r e p r e s e n t e d by t h e t r a n s ­ f o r m a t i o n Β + R = Ε c a n be g e o m e t r i c a l l y interpreted by t h e v e c t o r R f r o m t h e B E - p o i n t P(B) t o t h e B E - p o i n t P(E). The

sum o f D(B,E)

the = g

absolute |b±J

-

values e..|

=

of

the

entries

of

R,

£.|r..|

i s e q u a l t o t w i c e t h e number o f v a l e n c e e l e c t r o n s t h a t p a r t i c i p a t e i n t h e r e a c t i o n EM(B) E M ( E ) . We c a l l D ( B , E ) t h e c h e m i c a l d i s t a n c e between P(B) and P ( E ) , o r EM(B) and E M ( Ε ) , r e s p e c t i v e l y . Note t h a t c h e m i c a l distances r e f e r to R , and t o R . n

s

The o r i g i n o f t h e c o o r d i n a t e s y s t e m i n R corresponds t o t h e z e r o m a t r i x . S i n c e t h e sum o f e n t r i e s S = r i j i j > n

b

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT E T A L .

Computer

Programs

for Chemical

Problems

47

t h e number o f v a l e n c e e l e c t r o n s , i s t h e same f o r a l l B E - m a t r i c e s o f an F I E M , a n d F I E M i s mapped o n t o a l a t t i c e of points with non-negative i n t e g r a l coordinates in R , l y i n g on a s e g m e n t o f a " h y p e r s h e r e s u r f a c e " whose r a d i u s i s S = D ( B , 0 ) . n

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

T h i s r e m a r k a b l e t o p o l o g i c a l o r d e r o f t h e FIEM i s not only t h e o r e t i c a l l y a p p e a l i n g , but a l s o d i d a c t i c a l l y u s e f u l , because t h i s order r e v e a l s w e l l - d e f i n e d u n i v e r s a l l o g i c a l s t r u c t u r e s i n t h e immense w e a l t h o f individual facts. In p a r t i c u l a r , the o r d e r p r o v i d e s a b a s i s f o r t h e computer-oriented r e p r e s e n t a t i o n of molecular systems, and a l l o w s t o f o r m u l a t e c h e m i c a l f a c t s and p r o b l e m s i n a manner w h i c h i s w e l l - s u i t e d f o r t h e i r m a n i p u l a t i o n by c o m p u t e r s . 6.

MATCHEM

The m a t h e m a t i c a l m o d e l o f c o n s t i t u t i o n a l c h e m i s t r y which has been d e s c r i b e d i n the p r e c e e d i n g s e c t i o n s can be u s e d as a b a s i s f o r a m o d u l a r s y s t e m o f c o m p u t e r programs f o r the d e d u c t i v e s o l u t i o n of c h e m i c a l problems. I n i t i a l l y , t h e m a t h e m a t i c a l m o d e l was u t i l i z e d f o r C I C L O P S ^ , a p i l o t program f o r s y n t h e t i c d e s i g n . The i n s i g h t t h a t t h e m a t h e m a t i c a l m o d e l may s e r v e f o r the c o m p u t e r - a s s i s t e d s o l u t i o n o f a wide v a r i e t y of c h e m i c a l p r o b l e m s l e d t o t h e d e s i g n o f MATCHEM, a m o d u l a r s y s t e m o f c o m p u t e r p r o g r a m s , whose i n d i v i d u a l p a r t s may b e c o m b i n e d i n d i f f e r e n t ways t o s u i t d i f f e r e n t p u r p o s e s . The o r i g i n a l s y n t h e t i c d e s i g n p r o g r a m C I C L O P S was m o d i f i e d v i a t h e i n t e r m e d i a t e s t a g e MATSYN t o y i e l d t h e p r e s e n t s y n t h e t i c d e s i g n p r o g r a m EROS ( E l a b o r a t i o n o f R e a c t i o n s f o r O r g a n i c S y n t h e s i s ) w h i c h s e r v e s a s i m i l a r p u r p o s e as SECS ) and t h e o t h e r r e a c t i o n l i b r a r y o r i e n t e d s y n t h e t i c d e s i g n programs LHASA \ SYNC HEM °) , e t c . I n c o n t r a s t t o t h e p i l o t p r o g r a m C I C L O P S , i n EROS t h e m o d u l a r t y p e s t r u c t u r e i s emphasized more, i n o r d e r to enable the c o m b i n a t i o n w i t h o t h e r p a r t s o f t h e MATCHEM s y s t e m . 6

7

T h e s y n t h e t i c d e s i g n p r o g r a m EROS a n d i t s p r e d e c e s s o r s g e n e r a t e a t r e e o f B E - m a t r i c e s s t a r t i n g f r o m one B E matrix which r e f e r s to the s y n t h e t i c t a r g e t . This t r e e may b e i n t e r p r e t e d a s a t r e e o f s y n t h e t i c p a t h w a y s . If, however, the i n i t i a l m a t r i x does not r e p r e s e n t a

American Chemfeaf Society Library 1155 16th St. N. W. In Computer-Assisted Organic Synthesis; Wipke, W., el al.; Washington, D. C. 20031 ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

48

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

s y n t h e t i c t a r g e t and i t s b y - p r o d u c t s , but a g i v e n c h e m i c a l compound i n c o m b i n a t i o n w i t h a l i s t o f p o t e n ­ t i a l r e a c t a n t s , the generated t r e e of B E - m a t r i c e s can be i n t e r p r e t e d i n t e r m s o f s e q u e n c e s o f p r o d u c t s o b t a i n a b l e f r o m t h e i n i t i a l r e a c t a n t s . In t h i s f o r m t h e p r o g r a m f i n d s new u s e s f o r c h e m i c a l c o m p o u n d s , e . g . industrial by-products, or permits the p r e d i c t i o n of t h e c o n c e i v a b l e f a t e o f a c h e m i c a l compound i n t h e environment. The m a t h e m a t i c a l m o d e l a f f o r d s a l s o e n t i r e l y different a p p r o a c h e s t o c h e m i c a l p r o b l e m s . I f t h e i n i t i a l and f i n a l e n s e m b l e o f m o l e c u l e s , EM(B) a n d E M ( Ε ) , o f a chemical r e a c t i o n , or a sequence of r e a c t i o n s are k n o w n , o r a t t a i n e d by a n e d u c a t e d g u e s s , s u c h a s a comparison of substructures ( s e e s e c t i o n 5.1.3) t h e d i f f e r e n c e of the c o r r e s p o n d i n g B E - m a t r i c e s E-B = R c a n be a n a l y z e d t o y i e l d t h e p a t h w a y s o f individual s t e p s w h i c h l e a d f r o m EM(B) t o E M ( E ) . T h e s e p a t h w a y s may be s e q u e n c e s o f i n t e r m e d i a t e s i n a r e a c t i o n mechanism, or a l s o sequences of s y n t h e t i c inter­ m e d i a t e s . The p r e c o n d i t i o n f o r t h i s a p p r o a c h i s t h a t t h e i n d i c e s o f t h e a t o m s i n EM(B) a n d E M ( Ε ) a r e a p p r o p r i a t e l y a s s i g n e d . The r e q u i r e d a s s i g n m e n t o f a t o m i c i n d i c e s i s d i s c u s s e d i n s e c t i o n 5.3.1). A l l o f t h e s e a p p l i c a t i o n s o f the m a t h e m a t i c a l m o d e l , as w e l l as t h e s e a r c h o f B E - m a t r i x p a i r s ( Β , E) w h i c h f i t a g i v e n R - m a t r i x w i l l be c o n t a i n e d i n MATCKEM. I n t h e now f o l l o w i n g s e c t i o n s some r e c e n t l y p a r t s o f MATCHEM w i l l b e p r e s e n t e d .

6.1

Manipulation

6.1.1 C a n o n i c a l

of

implemented

BE-Matrices

Ordering

A p r e c o n d i t i o n f o r an e f f i c i e n t m a n i p u l a t i o n o f B E m a t r i c e s i n the computer i s a c a n o n i c a l i n d e x i n g o f t h e atoms i n a m o l e c u l e . In o r d e r t o g e n e r a t e a u n i q u e n u m b e r i n g we u s e t h e c o n n e c t i v i t y m a t r i x o f t h e m o l e c u l a r g r a p h and t h e l a b e l s a l r e a d y a s s i g n e d t o its v e r t i c e s , i . e . the c h e m i c a l symbols. An atom k i s c o n s i d e r e d t h e j - t h n e i g h b o r o f i i f j i s t h e m i n i m a l number o f b o n d s w h i c h s e p a r a t e i a n d k. All a t o m s k w h i c h meet t h i s c o n d i t i o n a r e c a l l e d t h e j - t h n e i g h b o r h o o d o f i . The z e r o t h n e i g h b o r h o o d o f i i s a t o m i i t s e l f . We d e f i n e a n a t o m i c d e s c r i p t o r

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

as

BRANDT E T A L .

Computer

Programs

for Chemical

Problems

49

follows

11 a 1 2 a., is J neighborhood

where

descending

the

...|a22..•a31l 32 a

atomic

and a l l

order.

The

a.,

number for

of

the

operator

11

atom k i n same j

are

[" means

the

j-th

put

in

concatenation.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

The d . a r e c o n s t r u c t e d i n s u c h a manner t h a t a l l j - t h n e i g h b o r h o o d s a r e a l i g n e d . T h i s i s a c h i e v e d by i n s e r t i n g dummy n e i g h b o r s w i t h a t o m i c number o f z e r o i f necessary. T h e s e q u e n c i n g o f a t o m s i s d o n e by p u t t i n g t h e a t o m i c d e s c r i p t o r s i n d e s c e n d i n g o r d e r . Atoms w h i c h c a n n o t b e a s s i g n e d a u n i q u e number by t h i s p r o c e d u r e a r e c o n s t i t u t i o n a l l y e q u i v a l e n t . A f i n a l numbering of the a t o m s i s r e a c h e d by i n t r o d u c i n g a r b i t r a r y information. One a t o m i s p i c k e d o u t o f t h e h i g h e s t r a n k i n g g r o u p o f c o n s t i t u t i o n a l l y e q u i v a l e n t atoms and i t s a t o m i c number s e t h i g h e r t h a n t h o s e o f t h e o t h e r a t o m s . The a t o m i c d e s c r i p t o r s a r e m o d i f i e d a c c o r d i n g l y and s o r t e d a g a i n . The g r o u p o f c o n s t i t u t i o n a l l y e q u i v a l e n t atoms i s thus s p l i t up. If a unique numbering i s s t i l l not p o s s i b l e the m o d i f i c a t i o n of the atomic d e s c r i p t o r s i s r e v e r s e d a n d a n o t h e r a t o m o u t o f t h e now h i g h e s t r a n k i n g g r o u p o f e q u i v a l e n t atoms i s p i c k e d , and t h e p r o c e d u r e j u s t d e s c r i b e d i s r e p e a t e d u n t i l a l l atoms a r e a s s i g n e d a u n i q u e number. I t s h o u l d be n o t e d t h a t i t d o e s n o t m a t t e r w h i c h a t o m i s p i c k e d o u t o f a g r o u p o f a t o m s w h i c h c a n n o t be a s s i g n e d a u n i q u e number i n t h e f i r s t p l a c e b e c a u s e t h e s e atoms a r e i n f a c t c o n s t i t u t i o n a l l y equivalent. I n t h i s a s p e c t o u r new i n d e x i n g r o u t i n e d i f f e r s f r o m o u r p r e v i o u s a p p r o a c h ^ . Our new p r o c e d u r e d o e s n o t have t h e drawbacks o f o t h e r methods ) known, like i n s t a b i l i t y of the c a l c u l a t i o n s , o s c i l l a t o r y behaviour, n o n c o n v e r g e n c e and i n d e t e r m i n a n c y . 9

6.1.2

Direct

Access

to

Structures

and

Substructures

The B E - m a t r i x , t o g e t h e r w i t h i t s a s s o c i a t e d v e c t o r s (atoms, s t e r e o c h e m i c a l p a r i t y b i t s ) c o n t a i n s a l l s t r u c t u r a l i n f o r m a t i o n about a m o l e c u l e . In a v a r i e t y o f a p p l i c a t i o n s , s u c h i n f o r m a t i o n has t o be s t o r e d and

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

50

COMPUTER-ASSISTED

a c c e s s e d r a t h e r f r e q u e n t l y . Examples c a t i o n s are documentation systems or r e a c t i o n graphs (trees or networks).

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

6.1.2.1

Inter-Machine

ORGANIC SYNTHESIS

of such a p p l i the a n a l y s i s of

Manipulation

T h e s t o r a g e r e q u i r e m e n t s o f t h e f u l l B E - m a t r i x make i t r a t h e r u n s u i t a b l e f o r d i r e c t s t o r a g e and r e t r i e v a l . C o m p r e s s i o n c a n b e a c h i e v e d i n two s t a g e s , d e p e n d i n g upon t h e a p p l i c a t i o n . F o r d a t a exchange between d i f f e r e n t computer systems (with p o s s i b l y different c o d e s a n d f i l e s t r u c t u r e s ) , c o m p r e s s i o n down t o t h e l e v e l of characters (purely numerical) i s a useful c o m p r o m i s e . The c o m p r e s s i o n m e t h o d i s straightforward i n t h a t i t f i r s t a s s i g n s e a c h s t r u c t u r e ( i . e . an i n t a c t m o l e c u l e or a m o l e c u l a r fragment) a unique number, w h i c h may b e d e r i v e d f r o m some s o r t o f r e g i s t r y number, o r may b e g e n e r a t e d by t h e s y s t e m i t s e l f , i n w h i c h c a s e i t u s u a l l y w i l l c o n t a i n some g r a p h i n f o r m a t i o n , s u c h a s n o d e number o r l i n k p o i n t e r . The i n f o r m a t i o n t h a t i s c o n t a i n e d i n t h e B E - m a t r i x and its a s s o c i a t e d atom v e c t o r i s c o m p r e s s e d t o t h e mininum number o f b y t e s w h i c h i s n e c e s s a r y f o r t h e e x p e c t e d range of values f o r each of these items. Such a d a t a format a l l o w s r a t h e r e c o n o m i c a l machine i n d e p e n d e n t p a c k i n g and u n p a c k i n g o f B E - m a t r i c e s and i s u s e d f o r d a t a t r a n s f e r on s e q u e n t i a l l y o r g a n i z e d , byte oriented media.

6.1.2.2

Intra-Machine

Manipulation

I n t e r n a l h a n d l i n g o f l a r g e amounts o f s t r u c t u r a l d a t a p u t s more s e v e r e r e q u i r e m e n t s o n d a t a f o r m a t a n d o n d a t a o r g a n i s a t i o n ( i n t h i s p a r a g r a p h , use o f random a c c e s s m e d i a , s u c h a s c o r e memory b a n k s - p o s s i b l y w i t h page o r b l o c k o r g a n i s a t i o n - o r m u l t i - t r a c k d i s c s is assumed). S t r u c t u r a l i n f o r m a t i o n c a n be - a n d i n f a c t i s f u r t h e r c o m p r e s s e d by a . ) r e p l a c i n g s e q u e n c e s o f z e r o e n t r i e s by p o i n t e r s , b . ) r e d u c i n g t h e s t o r a g e s p a c e f o r t h e r e m a i n i n g e n t r i e s (atom s y m b o l : 7 b i t , d i a g o n a l elements: 4 b i t , o f f d i a g o n a l upper t r i a n g l e : 2 b i t ) .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT E T A L .

Computer

Programs

for Chemical

51

Problems

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

F u r t h e r , the f i l e s c o n t a i n r e f e r e n c e s of d i f f e r e n t o r i g i n t o o t h e r e n t r i e s i n t h a t f i l e . These appear i n the form of n u m e r i c a l p o i n t e r s and, i f improperly o r g a n i z e d , c a n u s e up l a r g e s t o r a g e s p a c e . We d e v e l o p e d a f i l e o r g a n i z a t i o n to minimize storage requirement w h i c h makes u s e o f r e l a t i v e p o i n t e r s a n d p o i n t e r length c l a s s e s . Furthermore, short access paths from p o i n t i n g element (son) to p o i n t e d - a t element ( f a t h e r ) , are p o s t u l a t e d , i . e . minimization of core page-switching, o r d i s k h e a d movement respectively. I t h a s b e e n f o u n d t h a t t h e a b o v e two p o s t u l a t e s l e a d t o a d a t a o r g a n i z a t i o n , whereby sons a r e a s s i g n e d s t o r a g e i n t h e v i c i n i t y o f t h e i r f a t h e r . The a d d r e s s t o be a s s i g n e d i s d e r i v e d f r o m t h e d a t a t h e m s e l v e s by a s u i t a b l e h a s h - f u n c t i o n , which a s s i g n s linearly i n c r e a s i n g d i s t a n c e s f o r cases without c o n f l i c t , (addressed storage n o n - o c c u p i e d ) , but switches to f u n c t i o n s o f s e c o n d d e g r e e when c o n f l i c t s ( o v e r f l o w s ) a r e t o be h a n d l e d . S i n c e t h e r e a r e o n l y m o d e r a t e c o m p u t i n g r e q u i r e m e n t s , and s i n c e a c c e s s t i m e s on t h e storage media determines p r o c e s s i n g speed, a r a t h e r moderate computer i s needed f o r h a n d l i n g f a i r l y large a m o u n t s o f d a t a . We a r e a t p r e s e n t i n v e s t i g a t i n g t h e behaviour of a system with s e v e r a l thousand s t r u c t u r e s and t h e i r s u b s t r u c t u r e s ( s e e 5·1-3) on a " m i n i " c o m p u t e r o f t h e w e l l - k n o w n PDP11 f a m i l y t h a t i s equipped w i t h moving head d i s k packs t h a t have a s t o r a g e c a p a c i t y o f a b o u t two mega b y t e e a c h .

6.1.3

Hierarchically

Organized

Substructure

Files

I n a g r e a t many a p p l i c a t i o n s , a g i v e n s t r u c t u r e i s t r e a t e d as a combine o f i t s s u b s t r u c t u r e s . D e p e n d i n g on t h e c o n t e x t , s u b s t r u c t u r e s a p p e a r u n d e r t h e name of " f u n c t i o n a l group", "fragment", " r e a c t i o n i n v a r i a n t " "building block" etc. T y p i c a l a p p l i c a t i o n s are optimization strategies in synthesis planning, avoidance o f p r o h i b i t e d c o m b i n a t i o n s o f f u n c t i o n a l groups in generating chemical r e a c t i o n s , structure-activity correlations etc. Computer g e n e r a t i o n , s t o r a g e and r e t r i e v a l o f s u b s t r u c t u r e s i s g r e a t l y f a c i l i t a t e d , when t h e m a t h e m a t i c a l model of s t r u c t u r a l c h e m i s t r y i s employed to g e n e r a t e a h i e r a r c h i c a l l y organized substructure f i l e in a s y s t e m a t i c f a s h i o n . Such h i e r a r c h i c a l o r d e r i n g i s not only a p r e r e q u i s i t e i n avoiding d u p l i c a t i o n of

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

52

COMPUTER-ASSISTED

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

i n f o r m a t i o n and m i s s i n g l i n k s , i t f o r s e t t i n g up d a t a o r g a n i s a t i o n s previous chapter (5.1.2).

is as

ORGANIC SYNTHESIS

also indispensable d e s c r i b e d i n the

The a p p r o a c h t a k e n h e r e t r e a t s t h e o f f d i a g o n a l e n t r i e s o f a B E - m a t r i x as an o r d e r e d s e t o f numbers. E a c h e l e m e n t i n t u r n i s l o w e r e d by o n e , t h e r e s u l t i n g m a t r i x s e p a r a t e d i n t o b l o c k s and t h e d a t a f i l e i s t e s t e d f o r the presence of each r e s u l t i n g b l o c k . If the t e s t i s p o s i t i v e , then only a p o i n t e r to the g e n e r a t i n g s t r u c t u r e ( " f a t h e r " ) i s a d d e d t o t h e f i l e , a n d no f u r t h e r f r a g m e n t a t i o n i s n e e d e d , s i n c e a l l s u c c e s s o r s must b e i n the f i l e . If the t e s t i s n e g a t i v e , the son i s e n t e r e d i n t o t h e f i l e and i t s f a t h e r (and b r o t h e r if a n y ) a r e p u s h e d on a s t a c k f o r f u r t h e r t r e a t m e n t . E a c h e l e m e n t o f t h e s t a c k i s t r e a t e d i n t h e same way (thus becoming the f a t h e r of the next g e n e r a t i o n ) a f t e r the f i r s t son i s p r o c e s s e d . T h i s d e p t h - f i r s t fragmentation y i e l d s d a t a - f i l e s that f u l f i l the requirement of the p r e v i o u s c h a p t e r , i n t h a t sons a p p e a r as c l o s e as p o s s i b l e to t h e i r f a t h e r i n the g e n e r a t i o n p r o c e s s , thereby l e a d i n g to minimal pointer lengths. With such a f i l e s t r u c t u r e q u e r i e s take the form p a t h s w i t h i n a d i r e c t e d g r a p h and t h e r e f o r e a r e p r o c e s s e d w i t h a minimum o f c o m p u t i n g e f f o r t a n d extremely short response time.

6.2

Operating

6.2.1

The

with

Synthetic

of

R-Matrices Design

Program

EROS

Our p r e v i o u s s t u d i e s i n d e v e l o p i n g C I C L O P S , t h e p r o t o t y p e o f a s y n t h e t i c d e s i g n p r o g r a m , h a v e now l e d t o a new s t a g e . We a r e now p r e s e n t i n g EROS ( E l a b o r a t i o n o f R e a c t i o n s f o r O r g a n i c S y n t h e s i s ) which has matured to the p o i n t of becoming a r o u t i n e t o o l f o r the s y n t h e t i c chemist. A number o f o p t i o n s a l l o w s t h e u s e r t o d i r e c t t h e p r o g r a m t o meet h i s s p e c i f i c p r o b l e m s a n d n e e d s . Rather than t a k i n g the complete set of a l l mathematically possible r e a c t i o n matrices a s e l e c t i o n of only three R - m a t r i c e s i s c o n t a i n e d i n the program. These t h r e e R - c a t e g o r i e s i n c l u d e t h e m a j o r i t y o f a l l known s y n t h e t i c r e a c t i o n s and i t i s b e l i e v e d t h a t most o f t h e r e a c t i o n s w h i c h w i l l be d i s c o v e r e d i n t h e f u t u r e w i l l a l s o f a l l i n t h e s e t h r e e R - c a t e g o r i e s . So t h e a d v a n t a g e s o f t h e

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT

E T

Computer

AL.

Programs

for Chemical

53

Problems

m a t h e m a t i c a l model o f p r o v i d i n g a l s o u n p r e c e d e n t e d r e a c t i o n s i s l a r g e l y r e t a i n e d . On t h e o t h e r h a n d s u b s t a n t i a l r e d u c t i o n of the output of u n l i k e l y intermediates i s achieved.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

EROS c a n b e a p p l i e d t o t h e s t u d y o f c h e m i c a l r e a c t i o n s i n two b a s i c a l l y d i f f e r e n t w a y s : One c a n i n p u t several m o l e c u l e s and g e n e r a t e t h e p r o d u c t s t o be e x p e c t e d i n t h e r e a c t i o n s o f t h e s e m o l e c u l e s . Or o n e c a n l o o k f o r all s y n t h e t i c r e a c t i o n s l e a d t o a t a r g e t compound w h i c h is input. R e a c t i o n s mechanism c a n be s t u d i e d by a l l o w i n g e l e c t r o n i c c o n f i g u r a t i o n s f o r d i s t i n c t atoms.

certain

The r e a c t i o n s i t e i s d e t e r m i n e d by s t a t i n g w h i c h b o n d s a r e b r e a k a b l e . T h e s e c a n be f o u n d by a s t a n d a r d r o u t i n e o r p r e s e l e c t e d b y t h e u s e r . F o r e a c h r e a c t i o n t h e energy i s c a l c u l a t e d u s i n g parameters obtained from thermoc h e m i c a l d a t a - ' ) . By d e f i n i n g e n e r g y l i m i t s , r e a c t i o n s can be r e j e c t e d on thermodynamic grounds. 1

6.2.2

0

Predictor

1 1

Systems

F r o m t h e p r e v i o u s c h a p t e r i t b e c o m e s o b v i o u s t h a t Rm a t r i c e s , when c o m b i n e d w i t h s u i t a b l e s e l e c t i o n r u l e s , are a general t o o l f o r p r e d i c t i n g chemical r e a c t i o n products. I n some a p p l i c a t i o n s , c h e m i c a l s e l e c t i o n r u l e s may b e l e s s s t r i n g e n t b e c a u s e some o t h e r e x t r a n e o u s selection a l l o w s f o r f u r t h e r r e d u c t i o n o f output to manageable size. One g r o u p o f a p p l i c a t i o n s a r e p r e d i c t o r s y s t e m s . S u c h systems have, i n g e n e r a l , a c c e s s t o s t r u c t u r e f i l e s , i . e . f i l e s of i n t a c t molecules or molecular fragments o f t h e n a t u r e d e s c r i b e d i n c h a p t e r s 5.1.2) and 5.1.3) whose e n t r i e s a r e s e l e c t e d u n d e r a g i v e n a s p e c t . A s p e c t s may b e t o x i c i t y , p h a r m a c e u t i c a l a c t i v i t y , environm e n t a l i m p a c t , a v a i l a b i l i t y as an unwanted by-product i n an i n d u s t r i a l environment e t c . In such cases query e n t r i e s a r e p r o c e s s e d through r e a c t i o n g e n e r a t o r s . T h e s e may b e t h e c o m p l e t e s e t o f R - c a t e g o r i e s , b u t o f c o u r s e a s u b s e t o f t h o s e c a n be s e l e c t e d i f t h e a p p l i c a t i o n j u s t i f i e s i t . Output o f t h e r e a c t i o n g e n e r a t o r s i s t h e n c h e c k e d a g a i n s t structure f i l e s and matches a r e o u t p u t t o t h e u s e r .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

54

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

One s u c h s y s t e m i s a t p r e s e n t b e i n g i m p l e m e n t e d u n d e r a r e s e a r c h c o n t r a c t w i t h a p u b l i c agency w i t h the purpose of d e t e c t i n g sources of environmentally active c h e m i c a l s among c h e m i c a l s r e g a r d e d a s h a r m l e s s , themselves. A l t h o u g h an e s s e n t i a l p a r t o f s u c h s y s t e m s , the r e a c t i o n g e n e r a t o r s , due t o t h e i r a l g e b r a i c n a t u r e , t a k e up o n l y a n e g l i g i b l e amount o f t h e t o t a l c o m p u t i n g e f f o r t . It has t h e r e f o r e been r e m a r k e d , not w i t h o u t j u s t i f i c a t i o n , t h a t i n many a p p l i c a t i o n s p r e d i c t o r s y s t e m s a p p e a r t o t h e u s e r a s a mere e n h a n c e m e n t o f the c a p a b i l i t i e s of a documentation system. It should be s t r e s s e d , h o w e v e r , t h a t d o c u m e n t a t i o n s y s t e m s w i l l be a b l e t o s u p p o r t p r e d i c t o r m o d u l e s o n l y i f t h e y m a i n t a i n the complete s t r u c t u r a l i n f o r m a t i o n i n a form t h a t i s c o m p a t i b l e w i t h , o r c o n v e r t i b l e t o an a l g e b r a i c a l l y d e f i n e d B E - m a t r i x . D e s i g n , or r e d e s i g n of systems should take t h i s f a c t i n t o a c c o u n t .

6·3

Generation

and C l a s s i f i c a t i o n o f

6.3.1

Determination

of

the

R-Matrices

Minimal Chemical

Distance

L e t EM(B) and EM(E) be t h e i n i t i a l and f i n a l e n s e m b l e of molecules of a c h e m i c a l r e a c t i o n or sequence of r e a c t i o n . This chemical conversion determines i n a c h a r a c t e r i s t i c manner a c o r r e l a t i o n o f t h e atoms i n EM(E) a n d E M ( B ) . We c o n j e c t u r e t h a t c h e m i c a l r e a c t i o n s p r o c e e d i n s u c h a m a n n e r t h a t a minimum o f v a l e n c e e l e c t r o n s i s r e l o c a t e d . This i s c r i t e r i o n f o r the c o r r e l a t i o n o f t h e atoms o f EM(E) t o t h e atoms o f E M ( B ) . We c a l l t h e number o f v a l e n c e e l e c t r o n s w h i c h must b e r e l o c a t e d d u r i n g t h e t r a n s f o r m a t i o n f r o m E M ( B ) t o EM(E) w i t h a g i v e n c o r r e l a t i o n o f atoms t h e c h e m i c a l d i s t a n c e D ( B , E) between t h e s e EM. M a t h e m a t i c a l l y the chemical d i s t a n c e D(B, f o r m u l a t e d i n t h e f o l l o w i n g way Let

Then

(b..)

be

the

BE-matrix

of

A

(e..)

be

the

BE-matrix

of

Z.

E)

c a n be

and

i> j = l

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT

ET

Computer

AL.

Programs

for Chemical

Problems

55

F o r t h e c h e m i c a l l y m e a n i n g f u l r e p r e s e n t a t i o n E M ( B ) •> EM(E) t h e a t o m s o f E M ( B ) must b e a s s i g n e d t o a t o m s o f EM(E) i n s u c h a manner t h a t t h e c h e m i c a l d i s t a n c e D has i t s minimal value. This 1)

problem

several

ways

E x h a u s t i v e e n u m e r a t i o n : a l l a t o m s w i t h t h e same a t o m i c number a r e p e r m u t e d a n d t h e p e r m u t a t i o n corresponding to the minimal chemical d i s t a n c e i s taken. Let

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

c a n be a t t a c k e d i n

n

EM(E) atoms

2

atomic (n !)

number

0^.

k

method

of

atomic

,..(n !)

2

this

consist of

is

n^ a t o m s

number Then

of

0^,

atomic

. . .,

one h a s

to

carry

permutations.

Since

suitable

for

only

number

n^. a t o m s

10!

very

out

0^,

of (n^!)«

= 3 628

800

small

molecules. 2)

The c r i t e r i o n f o r m i n i m i z a t i o n o f t h e c h e m i c a l d i s t a n c e c a n be p a r t i t i o n e d i n t o t h e t h r e e c o m p l e mentary c r i t e r i a ( i ) , ( i i ) and ( i i i ) . (i) Atoms w i t h t h e same a t o m i c number n u m b e r s f r o m t h e same s e t o f n u m b e r s .

Example

In

I

(i)

I:

H^-C^N:.

^2

suffices

1

to

+

are

given

the

H-N^C:

3 12

minimize

the

chemical distance.

(ii) The atoms a r e a s s i g n e d i n s u c h a manner t h a t , f o r a t o m s w h i c h a r e a s s i g n e d t o e a c h o t h e r , a s many spheres o f n e i g h b o r s as p o s s i b l e c o r r e s p o n d . T h i s c r i t e r i o n f o l l o w s from the assumption, t h a t r e a c t i o n e a c h a t o m w i l l t r y t o m a i n t a i n a s many s p h e r e s o f n e i g h b o r s as p o s s i b l e .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

in

a

56

COMPUTER-ASSISTED

Example

ORGANIC SYNTHESIS

II:

H—

H—CjpH H.

- C l . + H - 0 H -* H - ^ C — C ^ — 0 - H H-^C 1 2 H / 5 |3 2 |3 H• ^ 5 H àr H H - Ç - H o

C

o

+ H-Cl.

1

6

H

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

H

In II ( i ) s u f f i c e s t o a s s i g n 0 and C l ; but f o r t h e assignment of the carbon-atoms ( i i ) i s necessary. (iii)

If there are c o n s t i t u t i o n a l l y symmetric molecules i n EM(B) a n d ( o r E M ( Ε ) ) t h e n a n a t o m A i n E M ( B ) c a n be a s s i g n e d t o more t h a n o n e a t o m Z ^ , · · · * Zk i n E M ( E ) e v e n i f one t a k e s ( i i ) i n t o a c c o u n t . T h e n t h e r e e x i s t two p o s s i b i l i t i e s : l 2 )

1) to to

A i s t h e f i r s t atom i n EM(B) w h i c h s h a l l be a s s i g n e d an atom i n E M ( E ) . Then A can be a s s i g n e d a r b i t r a r i l y one o f t h e a t o m s Ζ ^ , . . . Ζ ^ . .

2) One h a s a l r e a d y a s s i g n e d a t o m s Aj_ -> Z±9...9 A -> T h e n A c a n b e a s s i g n e d t o Z^ i n s u c h a way t h a t t h e

Z .

e

atoms to

Α ^ , - . , , Α

A as

the

are

atoms

Z,

in

the ,Z

same s p h e r e s e

to

In by

neighbors

Z..

T h i s m e t h o d w i l l be i t e r a t e d u n t i l neighbors correspond to each other Example

of

e

a l l spheres of as good as p o s s i b l e ,

III:

example III one c a n f i n d criterion (iii).

the

optimal

assignment

only

I n t h e c o m p u t e r p r o g r a m w h i c h we h a v e d e v e l o p e d f o r t h e m i n i m i z a t i o n o f t h e c h e m i c a l d i s t a n c e we u s e a l g o r i t h m s f r o m t h e f i e l d o f o p e r a t i o n s r e s e a r c h known as " o p t i m a l a s s i g n m e n t " - m e t h o d s . ^ 13

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2.

BRANDT

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

6.3.2

ET

AL.

Computer

Documentation

Programs

Systems

for Chemical

for

Problems

Chemical

57

Reactions

E n c o d i n g and r e t r i e v a l p r o c e d u r e s f o r c h e m i c a l s t r u c t u r e s h a v e b e e n d e v i s e d a n d u s e d s i n c e t h e t i m e s when s t r u c t u r a l o r g a n i c c h e m i s t r y became known. Although t h e y had t o be t a i l o r e d t o t h e t e c h n i c a l n a t u r e o f t h e d a t a c a r r y i n g m e d i a - be i t s p o k e n o r p r i n t e d w o r d s , manually processed card f i l e s , s e q u e n t i a l l y processed punch h o l e cards or magnetic t a p e s , - they a l l f u l f i l l e d t h e i r p u r p o s e a t t h e i r t i m e . The s i t u a t i o n i s r a t h e r d i f f e r e n t f o r c h e m i c a l r e a c t i o n s . We s t i l l r e l y o n s u c h a l c h e m i s t i c means a s n a m i n g - b y - a u t h o r or citing-by m a j o r - p r o d u c t e t c . T h i s i s symptomatic f o r our s t a t e of k n o w l e d g e . A p p a r e n t l y we h a v e no s y s t e m a t i c way o f d e s c r i b i n g r e a c t i o n s u n t i l now. (The n e e d , o f c o u r s e , was l e s s p r e s s i n g , s i n c e t h e number o f r e a c t i o n s i s much s m a l l e r t h a n t h e number o f s t r u c t u r e s . ) A p p r o a c h e s t h a t use s t a r t i n g and/or f i n a l p r o d u c t s as a v e h i c l e f o r d e s c r i b i n g a n d c l a s s i f y i n g r e a c t i o n s must b e regarded u n s u i t a b l e i n the l i g h t of our mathematical m o d e l : A r e a c t i o n i s a p a t h b e t w e e n p o i n t s on t h e s u b s p a c e , t h a t i s d e f i n e d by a n F I E M . A l t h o u g h a n i n d i v i d u a l p a t h may be d e s c r i b e d by i t s s t a r t i n g , f i n a l ( a n d p o s s i b l y some i n t e r m e d i a t e p o i n t s ) e q u i v a l e n c e c l a s s e s o f s u c h p a t h s c a n n o t be a d e q u a t e l y d e s c r i b e d and c l a s s i f i e d t h a t way. What we n e e d i s a d e s c r i p t i o n o f the path i t s e l f . An R - m a t r i x i s s u c h a d e s c r i p t i o n . I n i t s most g e n e r a l f o r m i t c o n t a i n s no r e f e r e n c e t o a p a r t i c u l a r F I E M . Therefore ( a s e x p l a i n e d i n s e c t i o n 4.3) e a c h i r r e d u c i b l e R - m a t r i x d e f i n e s a n R - c a t e g o r y . The m i n i m a l r e f e r e n c e to a s p e c i f i c FIEM, t h e n , i s through those elements of t h e atom v e c t o r , t h a t c o r r e s p o n d t o t h e rows and columns i n t h e i r r e d u c i b l e R - m a t r i x , i n o t h e r words to the nonzero e n t r i e s i n the f u l l R - m a t r i x . Increasing d e g r e e s o f s p e c i f i c i t y a r e o b t a i n e d by a r e f e r e n c e f r o m t h i s f i r s t s e t o f atoms ("the r e a c t i o n c o r e " ) t o t h e i r corresponding elements i n the BE-matrix, that i n t u r n e n l a r g e t h e atom v e c t o r ( " f i r s t s p h e r e " ) e t c . A h i e r a r c h y i s thereby a c h i e v e d , that w i l l lead to a r e a c t i o n f i l e s t r u c t u r e , where e a c h e n t r y i s r e f e r e n c e d t h r o u g h t h e v a r i o u s l e v e l s o f s p e c i f i c i t y up t o t h e R - c a t e g o r i e s . Q u e r i e s c a n be e n t e r e d a t any d e g r e e o f generality.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

58

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

In a p r a c t i c a l system, a c a n o n i c a l o r d e r i n g has t o be imposed onto R - m a t r i c e s , which i s most e a s i l y a c h i e v e d by s e l e c t i n g t h e l e x i c o g r a p h i c a l l y s m a l l e s t i r r e d u c i b l e R-matrix among t h e n! e q u i v a l e n t ones. L e x i c o g r a p h i c a l p r i o r i t y i s given to the d i a g o n a l elements. T h i s , a c t u a l l y , i s an a r b i t r a r y measure, but i t i s j u s t i f i e d on t h e ground, t h a t non-zero d i a g o n a l elements cause changes i n v a l e n c e s t a t e o f the a f f e c t e d atom, thereby earmarking i t as a key atom i n t h e r e a c t i o n c o r e . Remaining symmetries i n R - m a t r i c e s , t h e n , a r e r e s o l v e d on t h e f i r s t occurence o f d i f f e r e n t i a t i o n i n the course of i n c r e a s i n g s p e c i f i c i t y . I t i s worth remarking t h a t the d i f f e r e n t R - c a t e g o r i e s govern w i d e l y d i f f e r e n t p o p u l a t i o n s o f known r e a c t i o n s . One obvious example i s t h e simple f o u r c e n t e r r e a c t i o n of t h e k i n d A-B + C-D = A-C + B-D, which i s i n c i d e n t a l l y r e p r e s e n t e d by t h e s i m p l e s t R-matrix i n c l o s e d s h e l l c h e m i s t r y t h a t has a zero d i a g o n a l . We have s t a r t e d t h e d e s i g n o f a model program t o e x t r a c t i r r e d u c i b l e R-matrices from known r e a c t i o n s , t o put them i n t o c a n o n i c a l o r d e r , and t o s e t t h e sequence o f references i n a h i e r a r c h i c a l f i l e , that i s organized i n a s i m i l a r way as s u b s t r u c t u r e f i l e s d e s c r i b e d i n (5.1.3).

References 1.

J . D u g u n d j i , I . U g i , T o p i c s C u r r . Chem. 39, 19 (1973) see a l s o : I . U g i , P. Gillespie, Angew. Chem. 83, 980, 982 (1971), I b i d . I n t e r n a t . Edit. 10, 914, 915 (1971).

2.

I . U g i , P.D. Gillespie, C. Acad. Sci. 34, 4 l 6 (1972).

3.

J . Blair, J . Gasteiger, C. Gillespie, P. Gillespie, I . U g i in "Computer R e p r e s e n t a t i o n and M a n i p u l a t i o n o f Chemical I n f o r m a t i o n " , Ed. W.T. Wipke, S. H e l l e r R. Feldmann, E. Hyde, W i l e y , N.Y. (1974).

4.

J . B l a i r , J . Gasteiger, C. Gillespie, P.D. G i l l e s p i e I . U g i , T e t r a h e d r o n 30, 1845 (1974).

Gillespie,

Trans.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

N.Y.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch002

2. BRANT ET AL. Computer

Programs

for Chemical

Problems

59

5.

I . U g i , J. G a s t e i g e r , J. B r a n d t , J . F . B r u n n e r t , W. S c h u b e r t , IBM-Nachr. 24, 185 (1974) I. U g i , IBM-Nachr. 24, 180 (1974).

6.

W.T. Wipke, T.M. D y o t t , J. Amer. Chem. Soc. 96, 4825 (1974), and earlier r e f e r e n c e s .

7.

E . J . Corey, W.J. Howe, H.W. O r f , D.A. Pensak, G. P e t e r s s o n , J. Amer. Chem. Soc. 97, 6116 (1975), and earlier references.

8.

H. G e l e r n t e r , N.S. S r i d h a r a n , H.J. H a r t , S.C. Yen, F.W. F o w l e r , H.J. Shue, T o p i c s C u r r . Chem. 41, 113 (1973).

9.

M. R a n d i ć , J o u r n . o f Chem. Inform. and Comp. Sci., V o l . 15, No. 2, 1975 (105).

10. S.W. Benson, Thermochemical Kinetics, W i l e y , N.Y. (1968), S.W. Benson e t al. Chem. Rev. 69, 279 (1969). 11. T.L. Allen, J . Chem. Phys. 31, 1039 (1959), A . J . K a l b , A.L.M. Chung, T.L. Allen, J . Amer. Chem. Soc. 88, 2938 (1966). 12. J . G a s t e i g e r , P. Gillespie, D. Marquarding, T o p i c s C u r r . Chem. 48, 1 (1974).

I. Ugi,

13. R.E. B u r k a r d , Methoden d e r g a n z z a h l i g e n Optimierung, S p r i n g e r - V e r l a g , Wien-New York (1972).

Acknowledgements

We acknowledge g r a t e f u l l y or work by Deutsche Fonds d e r Chemischen

the f i n a n c i a l

support o f

F o r s c h u n g s g e m e i n s c h a f t and Industrie.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3 An Organic Chemist's View of Formal Languages H. W. WHITLOCK, JR.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

Dept. of Chemistry, University of Wisconsin, Madison, Wisc. 53706

I t seems g e n e r a l l y recognized t h a t , except f o r the most difficult of cases, the i n t r o d u c t i o n of mathema­ tics tends to obscure problems r a t h e r than make t h e i r s o l u t i o n e a s i e r . Be that asitmay we are q u i t e intri­ gued by the r e l a t i o n between formal languages and chemical s t r u c t u r e s , particularly when c o n s i d e r i n g the area of computerization of organic s y n t h e s i s . The pur­ pose of this paper is to o u t l i n e some of our thoughts on the above s u b j e c t , showing that one can apply common f a c t s derived from language theory to questions of chemical i n t e r e s t . For the chemist readers we will de­ fine languages and grammars. We will then discuss a number of "theorems" of organic chemistry (formal proof will not be attempted). F i n a l l y , we will point out that a c e r t a i n subset of organic s y n t h e s i s , the Func­ tional Group Switching Problem, i s amenable t o attack by viewing it as a problem i n the context of formal languages. Molecules as S t r i n g s of Symbols As we will see, the tenets of language theory assume that one is d e a l i n g with s t r i n g s of symbols: one dimensional l i n e a r assays. I f we wish t o analyze molecules according to t h i s theory it behooves us t o i n q u i r e as to what extent we may consider molecules t o be l i n e a r entities. We immediately note that mole­ cules, in particular the set of all s t r u c t u r e s con­ t a i n i n g r i n g s , are i n h e r e n t l y nonlinear in nature. On the other hand we recognize that we can represent any­ t h i n g i n a l i n e a r manner. I f we make the r a t h e r i n t e r e s t i n g equation of r e p r e s e n t a t i o n w i t h s t r u c t u r e then it f o l l o w s that s t r u c t u r e s , as represented, may be l i n e a r . Now one can c a r r y this as far as one likes but it seems t o the author that f o r most purposes one 60 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3.

WHITLOCK

Organic

Chemist's

View

of Formal

61

Language

should r e s t r i c t ones a t t e n t i o n to l i n e a r representa­ t i o n s that make a reasonable amount of i n t u i t i v e chemi­ c a l sense. For t h i s reason we w i l l not consider the case of c y c l i c s t r u c t u r e s . What types of molecules can be n a t u r a l l y represen­ ted i n a l i n e a r manner? C l e a r l y s t r a i g h t - c h a i n e d s t r u ­ ctures can. The representation of n-hexane, CH^CHgCHgCI^CHgCHg,

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

i s i n f a c t what one t h i n k s of when considering t h i s molecule (representation equals s t r u c t u r e ) . In p a r t i ­ c u l a r t h i s " s t r u c t u r e " i s i n t u i t i v e l y a s t r i n g of s i x symbols, namely

S i m i l a r l y e t h y l crotonate i s CH CH=CHCOOCH CH . This example p o i n t s up two d i f f e r e n c e s between a l i n e a r notation of a s t r u c t u r e and the a c t u a l s t r u c ­ ture i t s e l f . The above i s a s t r i n g of symbols so we have to s p e c i f y what symbols are involved. P o s s i b i l i ­ t i e s f o r e t h y l crotonate are: CH ,CH,=,CH,CO,OCH ,CH ; CH=CH,COO,CH ,CH ; and CH ,CH=CHCO,0,CH CH . Since i n general one w i l l attach meanings to the symbols i n ­ volved, these d i f f e r e n t representations may have d i f ­ ferent meanings. For example, the l a s t above might be described as s e q u e n t i a l l y a methyl, an αβ-unsaturated carbonyl, a dicoordinate oxygen, and an e t h y l . This i s a p e r f e c t l y good d e f i n i t i o n of t h i s molecule and, as a s t r i n g , i s somewhat more informative than a mere t a b u l a t i o n of the i n d i v i d u a l p a r t s . The second major d i f f e r e n c e between l i n e a r n o t a t i o n and a c t u a l s t r u c ­ tures l i e s i n the observation that s t r i n g s have an i n ­ herent ordering from l e f t to r i g h t while s t r u c t u r e s do not. This leads to a many i n t o one mapping of ordinary l i n e notations i n t o s t r u c t u r e s . Having seen that the ordinary l i n e n o t a t i o n of unbranched s t r u c t u r e s may be more or l e s s equated with the s t r u c t u r e i t s e l f we next turn to the case of branched s t r u c t u r e s . Just as an unbranched s t r u c t u r e corresponds to a s t r i n g of symbols, a branched s t r u c ­ ture has as i t s counterpart a t r e e . The nodes of the t r e e are part s t r u c t u r e symbols and the edges are bonds.* Thus 2,2,4-trimethylhexane may be represented by a number of t r e e s , one of which i s 3

2

3

2

3

3

• M u l t i p l e bonds may be represented example as a n o n s t r u c t u r a l node.

3

2

2

3

s e v e r a l ways, f o r

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3

62

COMPUTER-ASSISTED

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

CH

3

CH

3

CH

ORGANIC

SYNTHESIS

3

One can define " g l o b a l " part s t r u c t u r e s as was done above f o r e t h y l crotonate and one notes that again there w i l l be many tree representations per s t r u c t u r e . The point of t h i s , of course, i s t h a t , as i s w e l l knownQ), trees ( i n p a r t i c u l a r binary t r e e s ) may be r e ­ presented as l i s t s . The above tree has a l i s t represen­ t a t i o n , CH (C(CH3)(CH3)CH3)CHCCH2CH )CH3. Now t h i s doesn't look too appealing t o the chemists t r a i n e d eye but CH C(CH3)CCH3)CH2CHCCH3)CH CH does. This i s a l i s t representation of the tree 2

3

3

2

3

and i s s u s p i c i o u s l y close t o our usual l i n e notation of branched s t r u c t u r e s . We w i l l show below that the set of a l l a c y c l i c s t r u c t u r e s , wherein by " s t r u c t u r e " we mean that i n t u i t i v e l y w e l l defined s t r u c t u r a l n o t a t i o n used by organic chemists, comprises a context f r e e language. But f i r s t we must define the concept of languages and grammars. Grammars and Languages The f o l l o w i n g i s a very b r i e f i n t r o d u c t i o n t o the subject. We r e s t r i c t ourselves t o those aspects that are d i r e c t l y r e l a t e d t o the problem of applying the theory of languages to organic chemistry. For a more complete i n t r o d u c t i o n the reader i s r e f e r r e d to a num­ ber of e x c e l l e n t texts.(2-5) As conceived by Chomsky(6) the f o l l o w i n g are the

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3.

Organic

WHiTLOCK

Chemist's

View

of Formal

Language

63

c e n t r a l aspects of t h i s subject. Symbols. There i s some (normally f i n i t e ) set of symbols from which s t r i n g s (sentences) are made of. This set (Vx, the t e r m i n a l vocabulary) would be {CH , CH } f o r the unbranched alkanes above and would be {CH , CH ,CH,C,(,)} f o r the l i n e a r representation of branched alkanes. As a grammar embodies the concept of d e r i v a t i o n of some sentence i n a language there i s a l s o defined a set of symbols ( V N , nonterminal vocabulary). These are used i n d e r i v a t i o n s but do not appear i n the f i n a l sen­ tences of the language defined by the grammar of i n t e r ­ est. The b a s i c act of d e r i v a t i o n i n v o l v e s replacement of a nonterminal symbol i n a s t r i n g by a s t r i n g . For example the s t r i n g CH CH(R)R may be turned i n t o the s t r i n g CH CH(CH R)R by r e p l a c i n g the nonterminal R by the s t r i n g CH R. On the other hand i t might be changed i n t o CH CH(CH )R by r e p l a c i n g R by CH , de­ pending on what our r u l e s are f o r e f f e c t i n g these changes. F i n a l l y , there i s some unique member of V J J , the " s t a r t " symbol, from which a l l sentences may be derived. 3

2

3

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

2

3

3

2

2

3

3

3

Productions. A production i s j u s t a r u l e f o r making the above changes. Replacement of R by CH R i s symbolized by R •> CH R; replacement of R by CH , by R •> CH . As c o n v e n t i o n a l l y t r e a t e d a p p l i c a t i o n of pro­ ductions i s permissive i n the sense that there are no r u l e s s t a t i n g what production of some set must be a p p l i e d to a given s t r i n g . The problem of determining what s e r i e s of productions w i l l turn the unique s t a r t symbol S i n t o some s p e c i f i e d sentence then becomes an o c c a s i o n a l l y i n t r i c a t e puzzle. The general form of a production i s αΧβ + αω$, where X i s some nonterminal symbol and a, 3, and ω are a r b i t r a r y s t r i n g s . Produc­ t i o n s of t h i s form w i t h no r e s t r i c t i o n s on a, 3, and ω are of type 0. Those with the r e s t r i c t i o n that ω not be the empty symbol are of type 1. An equivalent statement i s that a type 1 production may be of the form αγ3 αω3, length (ω) >_ length γ, γ being an a r b i ­ t r a r y s t r i n g c o n t a i n i n g at l e a s t one nonterminal sym­ b o l . The presence of the context α and 3 leads to the term context s e n s i t i v e i n d e s c r i b i n g these productions. Productions of the Form X •> ω are of type 2. The absence of a context, α and 3 leads to the term context f r e e f o r t h i s type of production. F i n a l l y , the sim­ p l e s t type of production, type 3 or r e g u l a r , i s r e ­ s t r i c t e d to be of e i t h e r the form X + a (X i n V N , a i n V ) or X ·> aY (X and Y i n V N , a i n V ) . 2

2

3

3

T

T

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

64

Note that a l l context s e n s i t i v e productions are of type 0, a l l context f r e e productions are context s e n s i ­ t i v e , and a l l r e g u l a r productions are context f r e e . Note a l s o the d i r e c t analogy between productions as de­ f i n e d above and chemical r e a c t i o n s . The context s e n s i ­ t i v e production CH=CH CHOH -> CH=CH CO i s e q u a l l y viewed as a r e a c t i o n . A p p l i c a t i o n of t h i s production t o the s t r i n g GH CHOHCH CH2CH=CHCHOHCH3 may produce the new s t r i n g CH CHOHCH CH CH=CHOOCH3 but not CH COCH CH CH= CHCH0HCH . C l e a r l y i f s t r u c t u r e s are equated w i t h s t r i n g s , chemical r e a c t i o n s have as t h e i r counterpart productions. The term "context s e n s i t i v e " has very s i m i l a r meanings i n both cases. Along these l i n e s the production CH(OCH ) -> CHO i s of type 0 as i t leads t o a decrease i n the length of the s t r i n g . * The context free production CHO •> CH(R)0H, corresponds t o Grignard a d d i t i o n t o an aldehyde, while examples of r e g u l a r pro­ ductions are CH 0H •> CH Br, CHOH -> CO, e t c . Just as type 0 productions lead t o a r i c h e r language the analogous chemical r e a c t i o n s lead t o a r i c h e r more com­ plex chemistry as we go from the simple r e g u l a r "func­ t i o n a l group s w i t c h i n g " r e a c t i o n s t o those i n v o l v i n g b l o c k i n g and deblocking r e a c t i o n s . 3

2

3

2

2

3

2

2

3

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

3

2

2

3

Grammar. A grammar i s j u s t a defined set of non­ t e r m i n a l and t e r m i n a l symbols, a s p e c i f i e d member of V N (the s t a r t symbol) and a set of productions. Viewing r e a c t i o n s as productions we may define a chemistry as a set of molecular p a r t s , a s p e c i f i e d s t a r t i n g m a t e r i a l , and a set of r e a c t i o n s . This assumes that the chemis­ t r y can be thrown i n t o the proper grammatical form as discussed above. Languages. For a defined grammar, i t s attendant language i s the set of a l l s t r i n g s (over V T ) that can be generated by repeated a p p l i c a t i o n of the grammar's pro­ ductions, s t a r t i n g w i t h the s t a r t symbol. Pursuing our analogy between grammars and s y n t h e s i s , the langu­ age defined by some chemistry (chemical grammar) i s the set of a l l molecules that can be synthesized from the s p e c i f i e d s t a r t i n g m a t e r i a l by repeated a p p l i c a t i o n of the r e a c t i o n s — a language of s y n t h e s i z a b l e s t r u c t u r e s f o r that chemistry. Just as grammars may be of type 0, 1, 2 or 3 according t o the most complex type of pro­ duction present, a language i s of type 0 i f s p e c i f i e d by a type 0 grammar, e t c . Note that while a grammar i s •Assuming our symbols are CH, 0CH , (,), 2, CHO. 3

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

3.

Organic

WHITLOCK

Chemist's

View

of Formal

65

Language

an exact (although sometimes opaque) d e f i n i t i o n of a language, a language does not i n general s p e c i f y a un­ ique grammar. Chemical questions which are a d i r e c t t r a n s l i t e r a t i o n of t h e i r corresponding language theory counterparts are: Given two chemical grammars: do they define the same language of s y n t h e s i z a b l e s t r u c ­ tures. Given a p a r t i c u l a r chemical grammar (chemistry) of say type 0, i n v o l v i n g various blocking-deblocking sequences, i s there a simpler chemistry that defines the same language of s y n t h e s i z a b l e s t r u c t u r e s . Given a chemistry and a p a r t i c u l a r molecule i s the molecule a member of the chemistry's language (can i t be synthe­ s i z e d ? ) . Since t h i s membership question becomes more and more complicated as we go from type 3 to type 0 languages, can we place some sort of upper and lower l i m i t s on the complexity of t h i s problem w i t h i n the context of language types? These questions w i l l be dealt w i t h below. Examples 1) Grammar 1 V = {CH , CH , CEU} V = {ALKANE, R} S t a r t symbol = ALKANE Productions: ALKANE •> CH* ALKANE + CH R R + CH R -> CH R T

3

2

N

3

3

2

Pl.l PI.2 PI.3 PI.4

This i s an example of a s t r u c t u r a l grammar. The productions correspond to r u l e s f o r generating n-alkane s t r u c t u r e s rather than to chemical r e a c t i o n s . The language s p e c i f i e d by t h i s grammar i s the set of a l l nalkanes. D e r i v a t i o n of η-butane i s achieved t h u s l y : P1

2

P1

ALKANE ' >

4

P 1

CH R ' >

4

CH CH R ' )

3

3

2

P1

3

'>

CI^CHgCHgR CH CH CH CH 3

2

2

3

The sentence CH CH(CH )CH i s not i n t h i s language^ nor are CH CH OH or CH R ( t h i s l a t t e r i s a s e n t e n t i a l form). Since a l l productions are of type 3 (X + a or X •> aY) the set of a l l alkanes comprises a r e g u l a r language. The membership question f o r r e g u l a r langu­ ages i s exceedingly simple. From the r e g u l a r grammar above we may construct the algorithm or "machine" shown i n Figure 1. Rules f o r c o n s t r u c t i n g machines such as t h i s from r e g u l a r grammars are described elsewhere.(7) One s t a r t s i n the s t a r t s t a t e and makes s t a t e 3

3

2

3

3

3

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

66

Figure 1. Finite language defined ALKANE.

ORGANIC SYNTHESIS

state machine for recognizing members of the by grammar 1. The start state is that labelled The accept state is that one labelled F.

t r a n s i t i o n s as one reads the s t r i n g of i n t e r e s t from l e f t t o r i g h t . I f one i s i n the accept s t a t e when no more symbols are l e f t the s t r i n g i s a member of the language. I f not, not. Note that the machine requires only a f i n i t e amount of memory; hence the term f i n i t e s t a t e machine. 2) GRAMMAR 2 V = {CH , CH , OH, MgBr, Br} V = {S, OH, Br, MgBr} S t a r t State = S Productions: S CH OH OH ·* Br Br + MgBr MgBr CH OH T

3

2

N

3

2

P2.1 P2.2 P2.3 P2.4

This i s a chemical grammar, the productions correspond­ ing t o r e a c t i o n s . * The language i s the set of a l l 1a l k a n o l s , 1-alkylmagnesium bromides and 1-bromoalkanes. *The reader w i l l note that V and V»r are not d i s j o i n t as they are supposed t o be. This was done f o r c l a r i t y ' s sake as t h i s grammar can be e a s i l y r e w r i t t e n t o con­ form t o the standard d e f i n i t i o n . N

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3.

Organic

WHiTLOCK

Chemist's

View

of Formal

Language

67

Although the productions are not of a l l of the r e g u l a r form X a or X •> aY i t i s e a s i l y r e w r i t t e n to conform to t h i s format. The language i s thus a regular langu­ age and a f i n i t e s t a t e machine f o r recognizing members of the language i s e a s i l y constructed. Note that a d e r i v a t i o n of a sentence corresponds d i r e c t l y to i t s synthesis. D e r i v a t i o n of e t h y l bromide proceeds as: 2

1

S Ρ · ; CH OH

P

2

3

'

2

A

P 2

3

P 2

4

CH^Br ' > CH^MgBr ' > CH^CHgOH

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

P

2

'

2

j

CH CH Br 3

2

Theorem. The set of a l l wellformed a c y c l i c s t r u c ­ tures comprises a d e t e r m i n i s t i c context free language. We have defined above what we mean by "structures," the ordinary l i n e n o t a t i o n used by organic chemists wherein we r e l y on a s t r i n g representation with branch­ ing i n d i c a t e d by p a r e n t h e s i z a t i o n . The reader w i l l r e ­ c a l l that context free grammars allow more complicated nested productions, e.g. S •> aSb than are allowed f o r by r e g u l a r productions. Consider grammar 3 below: GRAMMAR 3 V = {CH , CH , CH, V = {S, R} S t a r t Symbol = S Productions: S S •> R ·> R + R + T

3

2

CH*,

(,)}

N

CH^ CH R CH CH R CH(R) 3

3

2

P3.1 P3.2 P3.3 P3.4 P3.5

This i s j u s t Grammar 1 with the a d d i t i o n of production P3.5. This production i s not regular but i s context f r e e . I t i s t h i s production that allows us to generate a r b i t r a r i l y branched s t r u c t u r e s such as CH CH(CH(CH CH )CH )CH . That the language defined by t h i s grammar i s not regular ( i . e . cannot be generated by a regular grammar) f o l l o w s from i t s being homomorphic a l l y equivalent with the set of balanced parentheses. The r e c o g n i t i o n problem f o r context f r e e languages i s i n h e r e n t l y more d i f f i c u l t than that f o r regular l a n ­ guages i n that one needs u n l i m i t e d memory ( e s s e n t i a l l y for the reason i n t h i s case of remembering what branch one i s c u r r e n t l y examining). A complete grammar that handles a large subset of a c y c l i c s t r u c t u r e s i s pre­ sented i n Figure 2. The language i s not regular but i s context f r e e , as can be v e r i f i e d by observing the form of the productions. This grammar i s the b a s i s of a d e t e r m i n i s t i c algorithm f o r parsing ( i . e . recogniz3

2

3

3

3

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC

68

V

T

SYNTHESIS

= {CH CH 0 HC0 H HCN CC> CO CH OH COOH 4

2

2

2

3

CHO CN C l Br HO HOgC OHC NC CH Ο 2

0 C H C CH HC C = 2

V

N

2

Ξ

( ) N}

= {S LMV RMV CHN DB RDB TB RTB SM SRP SLP DV LDV RDV TV LTV QV}

S t a r t Symbol:

S

PRODUCTIONS:

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

Structure S -> SM / SLP RMV / (LMV)2 TV RMV / (LMV)3 QV RMV / LDV(RMV)2 / LDV RBD / TLV RTB / LTV(RMV3 / QV(RMV)4 / (LMV3 TV / (LMV)4 QV / (LMV)2 QV RDB L e f t Monovalent LMV -> SLP / SLP CHN / (LMV)2 TV CHN / (LMV)2 TV / (LMV)3 QV / (LMV)3 QV CHN / (LMV)2 QV DB/ LDV DB / LTV TB Right Monovalent RMV + SRP / DV RMV / TV(RMV)RMV) / TV(RMV)2 / TV RDB / QV(RMV)2 RMV / QV(RMV)3 / QV RTB / QV(RMV)RDB / (CHN)N RMV / QV(RMV)(RMV)RMV / QV(RDB)RMV Continued Figure

2a.

Context Free structural grammar for computer input of linear notation via teletype. Meanings of nonterminal symbols are as above.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

structure

3.

Organic

WHITLOCK

Chemist's

View

of Formal

Language

69

Chaining CHN -> DV / DV CHN / TV(RMV) / TV(RMV)CHN / QV TB / QV(RMV)(RMV) / QV(RMV)(RMV)CHN / QV(RMV)2 CHN / TV DB / QV(RMV)DB Double Bond DB -> TV / = TV CHN / = QV(RMV) / = QB(RMV)CHN / = QV DB Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

Right Double Bond RDB •> RDV / « TV RMV / - QV(RMV)RMV) / = QV(RMV)2 / = QV RDB T r i p l e Bond TB -> RTB +

Ξ

QV /

Ξ

AV CHN

Ξ

TV /

Ξ

QV RMV

SM + C H 4 / CH 0 / HC0 2 H / HCN / C 0 2

2

/ CO

SRP + C H 3 / OH / C0 2 H / CHO / CN / CI / Br SLP -> C H 3 / HO / H0 2 C / OHC / NC / CI / Br DV + C H

2

LDV + C H

2

RDV TV

/ Ο / CO / C 0 2 / 0 2 C /

H2C

C H 2 / Ο / CO CH

LTV •> CH / HC QV -> C Figure

2a. Continued

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

70

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Figure 2b.

Parse tree of (CHs),C=CHCH2OH, sentative terpenoid

a repre­

ing) s t r u c t u r e s input i n t o a LISP organic synthesis program w r i t t e n by P. Blower (8). This grammar gener­ ates most common f u n c t i o n a l groups (those not included such as -N0 were l e f t out f o r nongrammatical reasons) and includes chaining, the representation of repeated subunits by the enclosing of them w i t h i n brackets. The representations of η-butane, C H 3 C H 2 C H 2 C H 3 , C H ( C H ) 2 C H , C 2 H 5 C 2 H 5 , C H 3 C H 2 C 2 H 5 , and others are a l l accepted by t h i s program, parsed, and converted i n t o i n t e r n a l r e ­ presentations, p o i n t i n g up the f a c t that there i s a many i n t o o&e mapping of s t r u c t u r a l representations into structure. We have seen above a simple s y n t h e t i c grammar wherein the productions are d i r e c t l y derived from chemi­ c a l r e a c t i o n s and the r e s u l t i n g language of s y n t h e s i z able s t r u c t u r e s i s the set of a l l molecules that can be synthesized from some s p e c i f i e d precursor by a p p l i c a ­ t i o n of the r e a c t i o n s . In the simple case of 1a l k a n o l s and 1-bromoalkanes the r e c o g n i t i o n problem i s t r i v i a l — w e can answer i t i n a time p r o p o r t i o n a l to the length of the molecule. We are n a t u r a l l y curious as to what i s the minimally complex grammar needed to 2

3

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

2

3

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

3.

WHiTLOCK

Organic

Chemist's

View

of

Formal

Language

71

mimic organic s y n t h e s i s of the more c o n v e n t i o n a l l y complex type. This i s not a s i l l y question f o r the f o l l o w i n g two reasons. Although one can quibble as to the extent to which a c y c l i c molecules may be equated with s t r i n g s there i s no question that the s i m i l a r i t y i s marked. Moreover, at l e a s t from a complexity sense i t i s c l e a r that algorithms f o r r e c o g n i z i n g r e g u l a r s t r u c t u r e s are i n h e r e n t l y simpler than those f o r r e ­ cognizing context f r e e ones. S i m i l a r l y the r e c o g n i t i o n of members of context s e n s i t i v e languages i s more d i f ­ f i c u l t yet. This i s true regardless of whether one i s t a l k i n g about the r e c o g n i t i o n of w e l l formed or synthes i z a b l e s t r u c t u r e s , or whether one i s doing t h i s i n a formal mathematical sense or v i a computer programs. Secondly i t i s not obvious that the question of synthes i z a b i l i t y i s i n f a c t answerable at a l l f o r a l l w e l l defined organic molecules. By answerable we mean having a r e c o g n i t i o n procedure that w i l l terminate i n some (not n e c e s s a r i l y short) p e r i o d of time w i t h the answer yes or no. The set of context s e n s i t i v e l a n ­ guages are recognized by the s o c a l l e d l i n e a r bounded automata and the question of membership of some s t r i n g i n a defined context s e n s i t i v e language i s known to be answerable. Type 0 languages on the other hand are r e ­ cognized by Turing machines and the question of member­ ship i s not answerable f o r type 0 languages as a s e t , although i t may be f o r some subset. Thus i f we cannot develop cogent arguments f o r the s u f f i c i e n c y of con­ t e x t s e n s i t i v e languages as a model f o r oganic syn­ t h e s i s we are l e f t w i t h the p o s s i b i l i t y that organic synthesis cannot be "solved" by computer. No guaran­ tees though, since organic chemistry i s not a c l o s e d science and what may be an a c c e p t i b l e s y n t h e s i s under some circumstances w i l l be unacceptible under others. We f i r s t show by a counterexample that context f r e e languages are i n s u f f i c i e n t model f o r organic s y n t h e s i s . We then argue ( a l a s we cannot prove, f o r the above reasons) that context s e n s i t i v e languages are a suf­ f i c i e n t model. Theorem. There e x i s t s a language of s y n t h e s i z a b l e s t r u c t u r e s that i s not context f r e e . I t i s w e l l known that languages such as ww, where w i s some s t r i n g over V T , are not context f r e e . For example the set of a l l alkanes CH CH(Ri)R where Ri=R2 i s not a context f r e e language. The set of a l l t e r ­ t i a r y a l c o h o l s derived by Grignard a d d i t i o n to methyl acetate, represented as CH COH(Ri)R , Ri=R i s thus not context f r e e . That i t i s context s e n s i t i v e f o l l o w s from c o n s t r u c t i o n of a context s e n s i t i v e grammar f o r 3

3

2

2

2

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

72

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

generating t h i s set (grammar 4, Figure 3). GRAMMAR 4 = {CH , CH, OH, C H , (,)} V N = {S, R, Δ, V, #, Χ, Y, Ζ, { W i | i c { C H , C H , CH (,)}}} S t a r t symbol = S S -* CH CHOH Δ V R # R -> CH X|CH R|CH(R)R i X + X i , i e { C H , C H , CH, (,)} RX + R XX -> X VX -> VY Yi iY YR -*• R YX + X Y# + Ζ iZ WiZi JW + WiJ, j e { C H , C H , CH, (,)} VW •> WiV AWi + Δ ί Δ - ( V -> ) Ζ Language = C H C H O H ( R i ) R , R =R

VT

3

2

3

2

3

3

2

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

3

±

2

3

2

±

3

2

x

2

Figure 3

Now i f we consider f u r t h e r the r e l a t i o n s h i p be­ tween d e r i v a t i o n of a sentence and i t s s y n t h e s i s , e s p e c i a l l y the r e l a t i o n s h i p between the intermediate s e n t e n t i a l forms and the precursor s t r u c t u r e s i n the syntheses we are l e d t o the f o l l o w i n g theorem. Theorem ( ? ) . For a chemistry derived from func­ t i o n a l group switching r e a c t i o n s , condensation reac­ t i o n s , demasking of masked f u n c t i o n a l groups, and b l o c k i n g and deblocking r e a c t i o n s : the language so de­ f i n e d i s a context s e n s i t i v e one ( i . e . there e x i s t s an equivalent context s e n s i t i v e grammar that generates the same language). The question of s y n t h e s i z a b i l i t y i s thus answerable. This f o l l o w s from the s o c a l l e d workspace theorem of context s e n s i t i v e languages.(9) The "proof" of t h i s simply e n t a i l s s t a t i n g i n f o r m a l l y the proof of the work­ space theorem, l e t t i n g s y n t h e t i c intermediates correspond t o i n d i v i d u a l steps i n the d e r i v a t i o n of the target molecule (sentence). 1) Consider a synthesis D of the form: S + Xο + X, 1 + '*' X„ η = target ° where S i s the s t a r t i n g m a t e r i a l (symbol), X

n

is a

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3.

WHiTLOCK

Organic

Chemist's

View

of

Formal

73

Language

sentence i n the language of synthesizable s t r u c t u r e s and X i X±+i i s some conversion. Assume moreover that there i s some estimate SIZE ( X i ) of the s i z e of each X i . We define the complexity C(X ,D) of the target X f o r t h i s p a r t i c u l a r synthesis to the max{SIZE(Xi), 0



CH3CH2COOH

Assume that the l e a s t complex route i s ^COOCHa CH3OH > CH Br > CH CH —» 3

3

CH3CH2COOH

\00CH3

SIZE

2

3

8

3

The r a t i o WS(X )/SIZE(X ) =2.7. However as the mole­ cules get l a r g e r , assuming that the condensing agents (e.g. malonic e s t e r ) stay constant i n s i z e , t h i s r a t i o n

n

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

74 decreases. CH3CH2OH

Thus f o r CH CH Br 3

2

—>

^COOCKU CH3CH2CH > CH (CH )2COOH \:000Η 9 4 3

2

3

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

SIZE

3

3

the r a t i o decreases to 2.25. For syntheses i n v o l v i n g condensation r e a c t i o n s i t would seem that there w i l l e x i s t some p. c) The same argument may be a p p l i e d to b l o c k i n g deblocking sequences and to operation i n v o l v i n g masked f u n c t i o n a l groups. I t seems i n e v i t i b l e that i f we con­ s i d e r b l o c k i n g groups to incrementally increase the complexity of some precursor the r a t i o WS(Xn)/SIZE(X ) must approach some number as s i z e ( X ) gets l a r g e r . In f a c t the only s i u t a t i o n wherein t h i s would not be the case would seem to be one wherein the complexity of some necessary b l o c k i n g group depended e x p o n e n t i a l l y on the complexity of the intermediate being blocked. This type of exponential b l o c k i n g group i s unknown to organic chemistry and indeed seems f o r e i g n to the very concept of i s o l a t e d and i n t e r a c t i n g f u n c t i o n a l groups. F u n c t i o n a l groups, even i n a relaxed d e f i n i t i o n are i n ­ herently l o c a l a f f a i r s . d) Consideration of a number of published syntheses of complex n a t u r a l products suggests a ρ of approximately 1.8, a remarkably low number. The author f i n d s i t s t r i k i n g indeed that those syntheses i n v o l v i n g the s e l e c t i v e reagents c h a r a c t e r i s t i c of " s y n t h e t i c methods" chemistry so much i n the vogue l a t e l y seem to have a smaller workspace than those c a r r i e d out i n the grand t r a d i t i o n a l manner. We conclude that the workspace theorem i s probably v a l i d f o r organic s y n t h e s i s , although i t i s c e r t a i n l y true that the above a n a l y s i s ignores questions d e a l i n g w i t h the formal r e l a t i o n s h i p between s y n t h e t i c schemes and d e r i v a t i o n s . It would be nice to be able to w r i t e a program that would take as input two sets of chemical reactions^ the second being the f i r s t augmented with some new r e a ­ gent, and give as output the answer to the question: "Does t h i s new s y n t h e t i c method allow us to do anything we couldn't do i n i t s absence?". This chemical c r i t i ­ que program would be at l e a s t u s e f u l to e d i t o r s of chemical j o u r n a l s . We note a l a s that f o r the set of context s e n s i t i v e languages the problem of equivalence of two grammars i s not answerable. Thus the above undertaking would seem to be a dubious one. Now of n

n

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3.

WHiTLOCK

Organic

Chemist's

View

of Formal

Language

75

course the f a c t that the equivalence problem i s un­ answerable f o r the set of a l l context s e n s i t i v e l a n ­ guages does not mean that i t i s so f o r some subset, f o r example augmenting a chemistry c o n t a i n i n g sodium hydroxide by a d d i t i o n of the reagent potassium hydr­ oxide. On the other hand we note that the equivalency problem i s not answerable f o r even the simpler set of context f r e e languages so i t seems u n l i k e l y that one could w r i t e a general program that would compare two chemistrys based on context s e n s i t i v e r e a c t i o n s and that would h a l t i n some f i n i t e time w i t h the e q u i ­ valence answer.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

The F u n c t i o n a l Group Switching Problem. (10) One frequently has occasion when d e v i s i n g a s y n t h e t i c scheme t o adjust f u n c t i o n a l i t y i n a molecule i n a manner that i s only i n d i r e c t l y r e l a t e d t o the s y n t h e t i c problem at hand. For example i f one d e s i r e s the transformation

i t i s necessary t o block the ketone toward a c t i o n of the organometallic reagent employed. Bearing i n mind the assumptions b u i l t i n t o t h i s problem we may define a "molecule" as being simply an ordered set of func­ t i o n a l groups. Chemically t h i s i s equivalent t o the idea of a molecule s being a set of f u n c t i o n a l groups imbedded i n a s t a t i c molecular framework. D e f i n i t i o n of a r e a c t i o n as an ordered t r i p l e t , (reagent, precursor f u n c t i o n a l group, product func­ t i o n a l group) then corresponds t o the assumption that r e a c t i o n s only i n t e r c o n v e r t f u n c t i o n a l groups; they do not lead t o increments of the carbon skeleton, or i f they do, i t i s only t o f i n i t e and l i m i t e d degree. Now t h i s sounds l i k e a rather r e s t r i c t e d p i c t u r e of organic chemistry, and i t i s , but i t i s s u r p r i s i n g l y c l o s e to the way one t h i n k s about b l o c k i n g group i n t e r conversions. We can pose the f u n c t i o n a l group s w i t c h ­ ing problem w i t h i n the context of t h i s s t r u c t u r a l no­ t a t i o n i n the f o l l o w i n g manner. What i s the shortest sequence of reagents that w i l l transform a defined s t a r t i n g m a t e r i a l S = ( S i , S 2 , * ' ' S ) , where S i i s the i t h f u n c t i o n a l group of compound S, i n t o the target Τ = (Ti,T2,···Τ ). The reagent sequence turns S i i n t o T i , S2 i n t o T2. etc. With t h i s d e f i n i t i o n of the f

n

η

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

76

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

f u n c t i o n a l group switching problem we are lead to the following. Theorem. The set of a l l sequences of reagents that a f f e c t s a s t a t e d f u n c t i o n a l group switching pro­ blem comprises a r e g u l a r language. The proof of t h i s theorem i s f a i r l y obvious and not too i n t e r e s t i n g . What i s i n t e r e s t i n g i s the con­ sequence of the theorem. The proof f o l l o w s from the r e c o g n i t i o n that our f u n c t i o n a l group r e a c t i o n d i c ­ t i o n a r y i s a f i n i t e d i r e c t e d graph wherein the nodes are l a b e l l e d with f u n c t i o n a l groups and the edges with reagents. A small r e a c t i o n graph i s shown i n Figure 4. This i s of the same form as the f i n i t e s t a t e ma­ chine i n Figure 1 and i f we define a s t a r t s t a t e (e.g. CO) and an accept s t a t e (e.g. CHOAc) i t i s a f i n i t e s t a t e machine f o r r e c o g n i z i n g a l l sequences of reagents that w i l l turn a ketone i n t o a secondary ace­ t a t e . The reagent sequence (NaBH^ DHP H 0 NaOH A c 0 ) i s a member of the language so defined while (NaBHi* DHP 3

H0 3

C r 0 / p y Α ο 0 ) i s not. 3

2

2

The r e l a t i o n s h i p between r e ­

gular languages, r e g u l a r grammars, and f i n i t e s t a t e machines i s such that most i n t e r e s t i n g questions d e a l ­ ing w i t h them are answerable. Whether a p a r t i c u l a r s t r i n g i s i n the language i s answerable ( i . e . does t h i s reagent sequence do the t r i c k ) . More i n t e r e s t i n g how­ ever i s the f o l l o w i n g which represents a general s o l u ­ t i o n to the f u n c t i o n a l group switching problem. Our r e a c t i o n d i c t i o n a r y defines, f o r the f u n c t i o n a l group switching problem Sx •> Τχ (S^ and Τχ s i n g l e f u n c t i o n a l groups) a language of s u f f i c i e n t s y n t h e t i c sequencesL For a problem S2 T2 a language L i s defined. I f we want to convert the binary compound ( S i S 2 ) i n t o the compound (Τι T ) , the language f o r t h i s i s the i n t e r ­ s e c t i o n of L i and L , i . e . those members common to L i and L are sequences that convert S i i n t o T i and S into T . I t i s known that the i n t e r s e c t i o n of two r e ­ gular languages i s i t s e l f regular so the problem of f i n d i n g the shortest sequence of reagents f o r e f f e c t i n g ( S i S ) —^> ( T i T ) i s that of f i n d i n g the shortest s t r i n g i n a r e g u l a r language. The d e t a i l s of construc­ t i n g the algorithm are presented elsewhere ( 1 0 ) but t h i s p o i n t s up what the author f e e l s to be one of the r e a l l y p r e t t y aspects of language theory. The conven­ t i o n a l proof of the statement that the membership ques­ t i o n i s answerable f o r regular languages i s construc­ t i v e i n that a proveably c o r r e c t algorithm f o r doing so i s developed. One may then s t a r t from t h i s f a c t and develop more e f f i c i e n t ways of achieving t h i s end. We o f f e r as evidence that the l i n g u i s t i c approach to 2

2

2

2

2

2

2

2

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

WHITLOCK

Organic

Chemist's

View

of Formal

Language

a e{DHP, NABH^, Cr0 Py, Ac 0, NaOH} 3

2

b e{DHP, NaBH , Cr0 Py, Ac 0, H 0} 4

3

2

3

c είϋΗΡ, Cr0 py, Ac 0, HgO, NaOH} 3

2

Figure 4. A small reaction graph for the four functional groups CO (ketone), CHOH (secondary alcohol), CHOTHP (tetrahydropyranyl ether), and CHOAc (secondary acetate)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

78

ORGANIC SYNTHESIS

organic synthesis i s of more than j u s t i d l e i n t e r e s t by presenting i n Figure 5 some representative problems w i t h t h e i r s o l u t i o n s . These examples, although pre­ sented i n the rather disembodied order n-tuplet s t r u c ­ t u r a l n o t a t i o n , c l e a r l y show that the r e s u l t s of t h i s approach represent nonobvious answers t o n o n t r i v i a l problems. I t would seem that the b a s i c feature of t h i s approach t o organic synthesis i s a p p l i c a b l e t o more complicated s t r u c t u r a l models as long as s e v e r a l condi­ t i o n s are met. F i r s t l y the s y n t h e t i c problem of i n t e r e s t must be capable of d i s s e c t i o n i n t o some inde­ pendent subproblems. This i s so because the s o l u t i o n procedure i n v o l v e s at l e a s t i m p l i c i t c o n s t r u c t i o n of the part s o l u t i o n s and working with t h e i r i n t e r s e c t i o n . Secondly the f i n i t e n e s s of the various sub-reaction d i c t i o n a r i e s i s important since one i s guided i n s o l v ­ ing a problem by the exhaustive s o l u t i o n of subproblems For r e a l molecules r e a c t i o n d i c t i o n a r i e s are i n f i n i t e . A p p l i c a t i o n of ones chemists i n t i u t i o n suggests that only a f i n i t e part of a r e a c t i o n d i c t i o n a r y i s chemi­ c a l l y i n t e r e s t i n g however. While there i s an i n f i n i t e number of s t r u c t u r e s that can be involved as i n t e r ­ mediates i n the conversion 1

only a small number are r e a l i s t i c i n nature when one has t h i s p a r t i c u l a r end i n mind. One problem of im­ mediate concern i s that of automating the making f i n i t e of the r e a c t i o n d i c t i o n a r y f o r f u n c t i o n a l i t i e s such as ketones that are destined f o r annulation. I n t e r a c t i o n of the computer w i t h the chemist i s c l e a r ­ l y necessary i n t h i s respect. A r e l a t e d c o n d i t i o n deals w i t h the very s t a t e na­ ture of t h i s approach. The problem RCH=CH-COR—£-> RCH=CH-CHOHR i s not properly viewed as (CH=CH, CO) ^ (CH=CH, CHOH) since t h i s e n t a i l s the i n c o r r e c t assump­ t i o n that one i s d e a l i n g w i t h an i s o l a t e d double bond and ketone. The i n t e r s e c t i o n of the two r e a c t i o n d i c t i o n a r i e s would i n c o r r e c t l y have "no r e a c t i o n " f o r the reagent (CH )2CuLi. One thus must t r e a t i n t e r a c t i n g f u n c t i o n a l groups as l a r g e r e n t i t i e s . This i s an acceptable p r i c e t o pay except f o r two consequences. Reaction d i c t i o n a r i e s of aggregate f u n c t i o n a l groups can be very large and t h e i r generation by hand i s a tedious and time consuming a f f a i r . I t appears worth9

3

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

3.

WHITLOCK

Organic

Chemist's

View

of Formal

START:

[RCHOEE RCO COOH]

TARGET:

[RCHOEE RCO RCHOH]

SEQUENCE:

Language

79

( G l y c o l , EVE, R L i , NaBH^, Ac 0, H 0, EVE, 2

3

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

OH) START:

[CH OH RCO COOH]

TARGET:

[CH^OH RCO R COH]

2

2

SEQUENCE: (GLYCOL, R L i RMgX H 0) 3

START:

[CH OH CHgOAc COOMe CHgOEE]

TARGET:

[CHgOAc CHgOH COOMe CHgBr]

2

SEQUENCE: (Cr0 /py H 0 TsCl NaBr OH EVE NaBH^ ACgO 3

CH N 2

3

2

H 0) 3

START:

[CH OH CH OAc COOMe CHgOEE]

TARGET:

[CH OH CH OAc COOMe CHgBr]

2

2

2

2

SEQUENCE: (Cr0 /py HgO TsCl NaBr NaBH^) 3

Figure 5. Functional is the ethoxyethyl

Group Switching problems solved as in Ref. 10. ether of a secondary alcohol; EVE is ethylvinyl

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

RCHOEE ether.

80

COMPUTER-ASSISTED ORGANIC SYNTHESIS

while t o solve t h i s reaction dictionary generation pro­ blem by a c o m b i n a t i o n o f c h e m i s t and computer w i t h t h e c h e m i s t e x e r c i s i n g h i s judgement i n p a r i n g t h e growth of t h e r e a c t i o n d i c t i o n a r y and making r a t h e r d e l i c a t e v a l u e judgements on e x a c t l y what r e a c t i o n i s e x p e c t e d of some a g g r e g a t e f u n c t i o n a l group under some s e t o f reaction conditions.(11)

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch003

Literature Cited

Parsing, Prentice

(1) McCarthy, J. Abrahams, P. W., Edwards, D. J., H a r t , T. P., Levin, M. I., "LISP 1.5 Programmer's Mannual", MIT P r e s s (1965). (2) H o p c r o f t , J. Ε., and U l l m a n , J. D., "Formal Languages and Their Relation t o Automata", A d d i s o n ­ -Wesley, R e a d i n g , MA, 1969. (3) G i n s b u r g , S., "The M a t h e m a t i c a l Theory o f C o n t e x t F r e e Languages", M c G r a w - H i l l , New Y o r k , 1966. (4) K a i n , R. Υ., "Automata Theory: Machines and Languages", M c G r a w - H i l l , New Y o r k , 1972. (5) Aho, Α. V., and U l l m a n , J. D., "The Theory o f Translation and Compiling", two volumes, Hall, Englewood Cliffs, N.J., 1973. (6) Chomsky, Ν., Handbook o f Math. P s y c h . (19 ), 2 W i l e y , New Y o r k , pp. 323-418. (7) Hennie, F. C., "Finite-State Models for Logical M a c h i n e s " W i l e y , New Y o r k , 1968. (8) B l o w e r , P., Ph.D. Thesis, University o f W i s c o n s i n ­ -Madison, 1975. (9) Salomaa, Α., "Formal Languages", Academic P r e s s , New York, 1973. (10) T h i s s u b j e c t is d i s c u s s e d a t l e n g h t in, W h i t l o c k , H. W., Jr., J. Am. Chem. Soc. ( 1 9 7 6 ) , 98, 0000. (11) Support o f this work by t h e National S c i e n c e F o u n d a t i o n is gratefully acknowledged.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4 A Chemical Engineering View of Reaction Path Synthesis RAKESH GOVIND and GARY J. POWERS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

Dept. of Chemical Engineering, Carnegie-Mellon Univ., Pittsburgh, Pa. 15213

The synthesis, analysis, and evaluation of reaction paths i s of fundamental importance i n the chemical process industries. Research and development chemists and chemical engineers are often confronted with problems which are best solved by u t i ­ l i z i n g chemical reactions. Figure 1 shows a matrix of the common problems and possible solutions encountered i n the chemical process industries. Problems associated with products may be classified into those which deal with compounds or mixtures of compounds which are a new market for the company and those which are existing products. Separation problems which might be solved by chemical reactions (e.g. use chemical reactions to convert one or more of the species i n a mixture

m

fH Q Ο

α, ο W OS

«

m

1

55

55

M pH

M C-l

m

Χ

Μ

Μ 55

55

EXISTING _NEW EXISTING

REACTIONS-

NEW REACTION NEW CONDITIONS

Figure 1. Problems commonly encountered in the chemical process industries which could be solved by use of reactions or reaction paths

81 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

82

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

to species which are more easily separated or need not be separated at a l l ) may also be c l a s s i f i e d by whether they are new problems or existing ones. New problems arise with new products or new legislation related to pollution or product purity. Problems may also arise due to changing a v a i l a b i l i t y of raw materials. For example, a new reaction path may make an existing process and associated raw materials highly available. The problem i s to find new uses for the raw materials and the process. In addition, i t might be desirable to develop reaction paths which u t i l i z e (or avoid) certain technology, patents, etc. Each of these problems may be attacked by inventing new reaction paths. The paths might be composed of known reaction steps i n a new combination. Or, a new reaction step could be developed and u t i l i z e d i n a sequence of reactions. The new reaction could be a reaction of functional groups which was previously unknown or a previously known reaction which i s made to go under new conditions. Evaluation of Industrial Reactions The evaluation of industrial reaction paths depends on a detailed understanding of the costs associated with each reaction step. A simple cost function i s Cost of Reaction = f(Overall Raw Material Cost, U t i l i t y Path Costs, Equipment Costs, Safety, Reliability, Flexibility) (1) Raw Material Costs

= g(Raw Material Cost/mole, Stoichiometry, Yield)

U t i l i t y Costs

= h(Thermochemistry, Reaction Conditions, Separation Difficulty)

(2) (3)

Separation D i f f i c u l t y = i(Property Differences, Concentrations, Recovery, Separation Conditions) (4) Equipment Costs

= j(Reaction Kinetics, Separation D i f f i c u l t y , Corrosion, Conditions) (5)

Safety

= k(Species Properties, Conditions, Other Possible Reactions)

Reliability

Flexibility

(6)

= I (Knowledge of Reaction, Separation, etc.)

(7)

» m(Range of Operating Conditions)

(8)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4.

GOviND

AND

POWERS

Reaction

Path

Synthesis

Obviously a great d e a l of i n f o r m a t i o n i s r e q u i r e d to a c c u r a t e l y compute the value o f t h i s f u n c t i o n . One of the major f u n c t i o n s of an i n d u s t r i a l r e s e a r c h and development e f f o r t i s to generate the knowledge and data necessary f o r decision-making r e l a t i v e to each r e a c t i o n step and p a t h . ( l )

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

I n d u s t r i a l R e a c t i o n Path Synthesis The challenge i n systematic r e a c t i o n path s y n t h e s i s i s t o generate r e a c t i o n paths and steps which could be v i a b l e a l t e r n a t i v e s i n an i n d u s t r i a l environment. The r e a c t i o n paths used i n i n d u s t r i a l environments have the f o l l o w i n g c h a r a c t e r i s t i c s : 1.

Small Target Molecules. The b a s i c petrochemical and f i n e chemical i n d u s t r i e s commonly d e a l w i t h molecules w i t h fewer than twenty h e t e r o - and carbon-atoms.

2.

M u l t i p l e Target Molecules (Reaction Networks). The chemical i n d u s t r y i s a network o f r e a c t i o n s fed by 3 to 5 b a s i c m a t e r i a l s and producing hundreds of t a r g e t molecules. The paths to each t a r g e t molecule i n t e r a c t w i t h each other by sharing raw m a t e r i a l s and byproducts. See F i g u r e 2 f o r p a r t of a r e a c t i o n network.

3.

Stoichiometry Must Be Known. The s t o i c h i o m e t r y f o r the path must be known i n d e t a i l . T h i s means t h a t a l l main r e a c t i o n products must be considered a t each r e a c t i o n s t e p . The f a c t that 2 moles o f NaCl or other simple r e a c t i o n products are produced duri n g a r e a c t i o n can have a major impact on the r e a c t i o n s economics. 1

4.

Y i e l d . The f r a c t i o n of the l i m i t i n g reagent which i s transformed i n t o the d e s i r e d t a r g e t molecule must be known.

5.

Byproducts. The amount and type of byproduct produced are necessary p i e c e s o f i n f o r m a t i o n . The main r e a c t i o n byproducts and side r e a c t i o n byproducts are needed f o r both the y i e l d c a l c u l a t i o n and the d e t e r mination of s e p a r a t i o n d i f f i c u l t y , c o r r o s i v e n e s s , safety, e t c .

6.

Impurity R e a c t i o n s . The l a r g e - s c a l e p r o d u c t i o n o f molecules demands a knowledge o f the f a t e o f impur i t i e s which enter w i t h the raw m a t e r i a l s or are produced i n the r e a c t i o n s t e p s . These i m p u r i t i e s can cause c o n s i d e r a b l e p o l l u t i o n and product q u a l i t y problems. I n many processes more equipment and energy i s devoted to c o n t r o l l i n g the i m p u r i t i e s than the main reagents and products. F i g u r e 3 i l l u s t r a t e s the types of r e a c t i o n s which could occur between i m p u r i t i e s and other species i n the r e a c t i o n mixture.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

83

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

METHACR YLIC ACIO

METHANOL

POLYESTER FIBER

— POLYSTYRENE PLASTIC

• STYRENE BUTADIENE RUBBER-

• NYLON FIBER

"OIMETHYL TEREPHTHALATE

ETHYLENE GLYCOL -

•ADiPlC ACID

URE A PHENOL FORM ALDEHYDE (ADHESIVE)

POLYETHYLENE • ACETYL SALICYLIC ACID "

IONOMERIC RESINS -

•ACETIC ANHYDRIDE

- CARBON BLACK ( • PETROLEUM SOLVENTS)-

•PARAXYLENE-

* ^ A M Î N E " ^ ^

•SALICYLIC ACID

CYCLOHEXANE

PHENOL

• ^

•FORMALDEHYDE

·» AMMONIA

» HYDROGEN CYANIDE

ETHYL ALCOHOL - ACETALDEHYDE

• ACETONE -

^•BUTADIENE—•ADIPONITRILE

•ETHYLENE -

- PROPYLENE-

CRUDE OIL 8 NAT. GAS «

XYLENE-

BENZENE

BUTANE-

METHANE•

ETHANE -

PROPANE-

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

COATING

PRINTING

INK

CABINETS

DRESSES

TV

TIRES

PLYWOOD

FERTILIZER

ASPIRIN

WIRE

GOLF BALL COVERS

4. GOviND

AND

Reaction

POWERS

Path

Synthesis

85

MAIN REACTIONS AT OTHER SITES PARALLEL

R^ + R^ = Ρ

SERIAL

R

+ Ρ

x

= BP

2

:

^

+ R

=

ΒΡ

χ

:

^

+ ΒΡ =

BP

3

2

χ

OTHER REACTIONS THAT OCCUR AT THE SAME CONDITIONS R

R

l

+

R

2

+

R

Ρ

2 2

R

l

R

+

R

+

R

l

P

R

2

+ Ρ

Ρ

+ ΒΡ

l

+

P

+

B

i

χ

P

i etc.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

IMPURITIES IN REAGENTS +

T

l

R

I

l

+

Figure species

7.

2

+ ΒΡ

R

+

h h I

+P

h h h h +

I 2

+

BP

1

etc.

χ

3. Reactions which could occur between the (R = reagents, Ρ = products, I = impurities, BP == byproducts) in a reaction mixture

R e a c t i o n Conditions. A l l of the features discussed above depend on the r e a c t i o n c o n d i t i o n s . I t i s nec­ essary to have i n f o r m a t i o n on a. b. c. d. e. f. g.

Phase(s) i n which the r e a c t i o n takes p l a c e . Solvents r e q u i r e d (the fewer the b e t t e r ) . Catalysts Temperature Pressure Concentrations Time (mixing)

Of course, t h i s i s asking f o r a great d e a l of i n f o r m a t i o n . Can modern t h e o r i e s and a p p l i c a t i o n s of r e a c t i o n path synthesis make any c o n t r i b u t i o n to t h i s problem? Are the i n d u s t r i a l l y impor­ tant molecules too "simple f o r current r e a c t i o n path synthesis programs? Is too much s p e c i f i c data required? 11

I n the f o l l o w i n g s e c t i o n s we present a b r i e f review of work i n computer-as s i s ted r e a c t i o n path s y n t h e s i s . The work i s com­ pared w i t h the needs of i n d u s t r i a l r e a c t i o n - p a t h problems. F i ­ n a l l y , a program c a l l e d REACT which we have developed f o r r e a c ­ t i o n path synthesis i s d e s c r i b e d and i l l u s t r a t e d .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

86

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Representation o f Molecules and Reactions One key element i n developing a r e a c t i o n path program i s the s e l e c t i o n o f r e p r e s e n t a t i o n s o f molecules and t h e i r r e a c ­ t i o n s which a r e appropriate f o r a g i v e n problem. F i g u r e 4 conceptually compares the p r e d i c t i v e power and g e n e r a l i t y o f three r e p r e s e n t a t i o n s . P r e d i c t i v e power i s d e f i n e d as the a b i l i t y to p r e d i c t the c o n d i t i o n s , s t o i c h i o m e t r y , byproducts, y i e l d , e t c . , o f a given r e a c t i o n step i n each r e p r e s e n t a t i o n . G e n e r a l i t y i s the a b i l i t y t o generate " a l l possible reactions.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

1 1

The three r e p r e s e n t a t i o n s are shown s c h e m a t i c a l l y i n F i g ­ ure 5. The most general types o f r e p r e s e n t a t i o n use a mathe­ m a t i c a l a b s t r a c t i o n o f r e a c t i o n which consider a l l the p o s s i b l e means f o r making and breaking bonds between atoms. Hendrickson (2) has i n v e s t i g a t e d a general r e p r e s e n t a t i o n i n which molecules are represented by the " c h a r a c t e r " o f the carbon s i t e s which occur w i t h i n the molecule. The character o f a s i t e i s a two d i g i t number which i n d i c a t e s whether a carbon atom i s a t ­ tached t o σ other carbon atoms by sigma bonds or t o " f u n c t i o n a l " groups. The f u n c t i o n a l i t y o f a carbon s i t e i s defined as the number o f ττ bonds and heteroatoms attached to the carbon. The character o f a s i t e i s g i v e n by c = 10σ + f

(9)

where f - π + ζ

(10)

Valence c o n s t r a i n t s give σ + π + η + ζ £ 4

(11)

where h i s the number o f bonds to H or l e s s e l e c t r o n e g a t i v e atoms. With t h i s d e f i n i t i o n o f carbon s i t e , Hendrickson was able to c l a s s i f y a l l p o s s i b l e carbon s i t e s and the " r e a c t i o n s " which might i n t e r c o n n e c t them. F i g u r e 6 gives the r e a c t i o n t r i a n g l e for this representation. Camp and Powers extended t h i s r e p r e s e n t a t i o n by a l l o w i n g the f u n c t i o n a l i t y ( f ) to be separated i n t o π and ζ dimensions. The character o f a s i t e then becomes c = ΙΟΟσ + 10 f + ζ

(12)

I n a d d i t i o n , Camp (3) developed the s i t e c h a r a c t e r i s t i c s f o r heteroatoms. These extensions transform Hendrickson s t r i a n g l e 1

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

GOviND

AND

POWERS

Reaction

Path

Synthesis

PREDICTIVE POWER (ABILITY TO PREDICT THE YIELD, STOICHIOMETRY, CONDITIONS, BYPRODUCTS, ETC.)

ROTE

1

SUBHENDRICKSON S STRUCTURE METHOD TRANSFORMS (COREY e t a l . )

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

GENERALITY (ABILITY TO GENERATE "ALL" REACTION PATHS)

Figure

Figure

5.

4.

Ability to predict performance vs. generality three representations of reaction

A schematic

drawing of three representations their reactions

for

of molecules

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

and

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

COMPUTER-ASSISTED

F = functionolity = 7T and /or ζ

f = 7Γ + Ζ

c = Ι Ο σ • f = character

Figure 6. Character triangle for carbon sites and interconversions. The char­ acter C, is shown beneath each carbon site. The shaded areas indicate possibte double bond sites (C = Ο.Υ///λ ; possible triple bond sites (C == C), GS3; and possible aromatic sites I Mill.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4.

GOViND

AND

POWERS

i n t o a pyramid. F i g u r e 5.

Reaction

Path

Synthesis

89

I t i s t h i s r e p r e s e n t a t i o n t h a t i s shown i n

The advantage of t h i s r e p r e s e n t a t i o n i s that i t i s v e r y compact. Hendrickson s t r i a n g l e has only 15 s i t e s and 70 " r e ­ actions." Camp's pyramid has 24 s i t e s and 132 r e a c t i o n s . The development of computer codes which u t i l i z e these r e p r e s e n t a ­ t i o n s i s extremely simple. The r e p r e s e n t a t i o n i s a l s o able to generate " a l l " p o s s i b l e r e a c t i o n s which lead to a p a r t i c u l a r t a r g e t molecule. The procedure could s t a r t a t the t a r g e t mole­ cule and generate " a l l " p o s s i b l e p r e c u r s o r s by using the r e ­ a c t i o n s on the pyramid. Each s i t e i n the t a r g e t i s developed independently and c e r t a i n dependencies must be checked (e.g. π bonds between atoms). I n essence, t h i s r e p r e s e n t a t i o n i n v o l v e s making (or breaking) every p o s s i b l e bond i n the molecule sub­ j e c t to the v a l e n c y and s i t e c o n s t r a i n t s .

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

1

T h i s g e n e r a l i t y i s obtained w i t h the l o s s o f power to p r e ­ d i c t the consequences of each r e a c t i o n step. The r e a c t i o n s are not c l a s s i f i e d by the exact nature of the f u n c t i o n a l i t y . Hence r e a c t i o n s w i t h a f u n c t i o n a l i t y of f = 1 ( a l c o h o l , e t h e r , o l e f i n ) are a l l t r e a t e d s i m i l a r l y . I n a d d i t i o n , some of the r e a c t i o n s are not known to have analogs i n nature. These r e a c t i o n s j u s t don't occur (or a t l e a s t we haven't observed them y e t ) . 1

Hendrickson s r e p r e s e n t a t i o n i s an example o f a generateand-test problem-solving approach i n which the generator i s uninformed. I t simply generates p o s s i b l e r e a c t i o n s and i t i s up to the t e s t e r ( e v a l u a t o r ) to determine the f e a s i b i l i t y and value of each step or path. 1

REPAS: A Case Study Using Hendrickson s Representation Govind (4) developed a r e a c t i o n path synthesis program based on Camp's g e n e r a l i z a t i o n o f Hendrickson's r e p r e s e n t a t i o n . The program was used to generate paths f o r a wide range o f i n ­ d u s t r i a l molecules. For even a simple molecule such as a c r y l i c 0 II

(C=C-C-0H), w e l l over 10,000 r e a c t i o n paths were generated. The e v a l u a t i o n of these paths was a d i f f i c u l t task. Many of the paths i n c l u d e d "strange" r e a c t i o n steps f o r which no meaningful mechanism could be e n v i s i o n e d . While a few o f these r e a c t i o n steps provided i n t e r e s t i n g a l t e r n a t i v e s , there was a general l a c k of correspondence between these paths and those which ex­ perienced chemists and chemical engineers might execute i n the l a b o r a t o r y , p i l o t p l a n t s , or i n d u s t r i a l process. Our i n a b i l i t y to be s p e c i f i c about each r e a c t i o n step g r e a t l y l i m i t s the use­ f u l n e s s of t h i s r e p r e s e n t a t i o n . I n an a b s t r a c t sense t h i s ap­ proach could produce " a l l " r e a c t i o n paths. However, the over­ whelming number o f combinations of f u n c t i o n a l groups and r e ­ a c t i o n s s e v e r e l y l i m i t s the p r o b a b i l i t y o f f i n d i n g a new and

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

90

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

r e a l i z a b l e r e a c t i o n path. A r e p r e s e n t a t i o n o f r e a c t i o n i s needed which i s perhaps l e s s general but has more s p e c i f i c chemical data and theory i n i t . Substructure Representation E.J. Corey and h i s co-workers (5) have developed a r e p r e s e n t a t i o n o f molecules and r e a c t i o n s based on the combination of atoms i n the molecule which change during known r e a c t i o n s . Molecules are represented by l i n k e d data l i s t s . The l i s t s cont a i n the complete t o p o l o g i c a l and atom type i n f o r m a t i o n f o r the molecule. Reactions are represented by transforms which i n d i cate how substructures w i t h i n a molecule change during a spec i f i c type o f r e a c t i o n . C l a s s i f i c a t i o n o f the r e a c t i o n s by only the atoms which change allows the transforms to be a p p l i e d t o a wide range o f molecules which might c o n t a i n these s u b s t r u c t u r e s . F i g u r e 7 i l l u s t r a t e s the general steps i n applying a transform. FIND SUBSTRUCTURES WITHIN THE TARGET MOLECULE

CAN THE SUBSTRUCTURES BE •MADE BY REACTIONS IN THE REACTION LIBRARY?

NO

NEED MORE REACTIONS

NO SET UP SUBGOALS TO MAKE REACTION FEASIBLE

J

YES

IS EACH REACTION FEASIBLE GIVEN -THE ATOMS, FUNCTIONAL GROUPS, RINGS, ETC. LOCAL TO THE REACTION SITES?

I

YES

WHAT IS THE VALUE OF THE REACTION? CONSIDER GLOBAL REACTIONS RAW MATERIAL COSTS MECHANISM THERMOCHEMISTRY BYPRODUCTS, ETC.

IF THE VALUE IS ACCEPTABLE APPLY THE REACTION

CONTINUE WITH PRECURSORS AND SUBGOALS UNTIL AVAILABLE RAW MATERIALS ARE ENCOUNTERED Figure

7.

Steps

in applying a transform change during

based on the substructures reaction

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

which

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

4. GoviND

A N D POWERS

Reaction

Path

Synthesis

91

F i r s t , s t a r t w i t h the t a r g e t molecule and f i n d a l l the subs t r u c t u r e s w i t h i n the molecule f o r which transforms ( r e a c t i o n s ) are a v a i l a b l e . Then check to see i f the transform i s a p p l i cable f o r the s p e c i f i c molecule i n question. I n the transform there are three l e v e l s o f e v a l u a t i o n . The f i r s t l e v e l i s simp l y a check to ensure t h a t the c o r r e c t substructures are present. The second l e v e l o f e v a l u a t i o n checks the atoms " l o c a l " to the r e a c t i o n s i t e s . I f the l o c a l atoms p r o h i b i t the part i c u l a r transform the r e a c t i o n i s e i t h e r k i l l e d ( i . e . no longer considered) or a subgoal i s s e t up to change the atoms which are b l o c k i n g the r e a c t i o n . A t the t h i r d l e v e l o f e v a l u a t i o n the l o c a l atoms are checked to see how much they w i l l improve or d e t r a c t from the r e a c t i o n , (e.g. i f alpha i s e l e c t r o n withdrawing add 10 to score.) I n a d d i t i o n , i t i s important t o check the g l o b a l features o f the molecule f o r competing r e a c t i o n s i t e s f o r the same transform, or f o r other r e a c t i o n s which might take place a t the c o n d i t i o n s r e q u i r e d f o r the d e s i r e d r e a c t i o n . Corey and co-workers (5) have developed a l i b r a r y o f over 300 transforms o f t h i s type. Most o f the secondary checks on the r e a c t i o n s ( i . e . the a f f e c t o f l o c a l attachments) are knowledgeable g e n e r a l i z a t i o n s o f the reviews published by Marsh ( 6 ) , House ( 7 ) , and Buehler and Pearson ( 8 ) . This approach has considerably more p r e d i c t i v e power than the method o f Hendrickson. Of course, i t r e q u i r e s a l a r g e r data base. The g e n e r a l i t y o f the approach i s l i m i t e d by the transforms i n the data base and the secondary and t e r t i a r y checks made on each r e a c t i o n . I f the substructure transform i s not i n the data base i t w i l l not be i n c l u d e d i n any r e a c t i o n path generated using t h i s approach. I f the secondary and t e r t i a r y checks are too conservative the r e a c t i o n may not be used when i t a c t u a l l y might work. I t might be p o s s i b l e to systema t i c a l l y r e l a x the l o c a l c o n s t r a i n t s (and use a method l i k e Hendrickson s) i n some f u t u r e program. These r e a c t i o n steps would then be the subject o f a research and development program to see i f they can be made t o work. 1

Rote Memory The most powerful p r e d i c t o r and l e a s t general approach i s that o f simply s t o r i n g s p e c i f i c instances o f known r e a c t i o n s along with the conditions under which the r e a c t i o n s were performed. The p a r t i c u l a r t a r g e t molecule i s then simply patterned matched with r e a c t i o n s i n the data base. Search S t r a t e g i e s The main i s s u e i n search i s whether the synthesis i s planned by working backwards from the t a r g e t or forwards from the s t a r t i n g m a t e r i a l s . F o r many syntheses the t a r g e t i s a s i n g l e molecule and the s t a r t i n g m a t e r i a l s are many (commonly

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

92

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

10,000 or more). Hence working backwards i s an e f f i c i e n t s t r a t egy. However, when the molecules are s m a l l e r , the s e t o f s t a r t i n g m a t e r i a l s may not be n e a r l y so l a r g e . I n petrochemical syntheses n e a r l y a l l paths s t a r t w i t h molecules such as e t h y l ene, propylene, butane, e t c . , which o f t e n number l e s s than 100. For cases l i k e these i n which the s t a r t i n g m a t e r i a l s may be def i n e d , a forward s t r a t e g y may be more e f f i c i e n t . Powers and Jones (9) and Powers e t a l . (10) found that a forward-branching search s t r a t e g y ( d i s c r e t e dynamic programming) was undoubtedly the b e s t one f o r planning the chemical s y n t h e s i s o f b i h e l i c a l DNA. The advantage o f working forward from the raw m a t e r i a l s i s t h a t the e v a l u a t i o n may be more a c c u r a t e l y computed. When working backwards i t i s not p o s s i b l e to know the c o s t s o f p r e cursors s i n c e t h e i r paths have not y e t been developed. I n a d d i t i o n to search d i r e c t i o n i t i s necessary to d e t e r mine the order of enumeration o f the s y n t h e s i s t r e e . Depth, breadth and h y b r i d search procedures are a l l p o s s i b l e . For most current programs the e v a l u a t i o n i s so u n c e r t a i n that human i n t e r a c t i o n w i t h the program during e x e c u t i o n i s necessary and undoubtedly more f r u i t f u l . A p p l i c a t i o n s to I n d u s t r i a l R e a c t i o n Paths Most c u r r e n t programs have been aimed a t l a b o r a t o r y syntheses. The d e t a i l and data r e q u i r e d f o r i n d u s t r i a l paths i s not commonly i n these programs. Many o f these programs s t a r t w i t h raw m a t e r i a l molecules which are the t a r g e t s of i n d u s t r i a l chemists. We are i n t r i g u e d by the p o s s i b i l i t y o f a p p l y i n g these ideas to smaller molecules. To our knowledge, only two a p p l i c a t i o n s o f these techniques have been made to i n d u s t r i a l r e a c t i o n problems. The Chioda Chemical Engineering and C o n s t r u c t i o n Company i n Tokyo, Japan has used a program c a l l e d CHEMONICS i n which known s p e c i f i c r e a c t i o n s numbering approximately 100 are s t o r e d . The thermochemical p r o p e r t i e s , and i n some cases k i n e t i c parameters, are known f o r these r e a c t i o n s . The r e a c t i o n s are mixed and matched by the user w i t h the program performing the a n a l y s i s o f the r e a c t i o n network. I n the f o l l o w i n g s e c t i o n s we d e s c r i b e a prototype program, REACT, which i s c u r r e n t l y being, t e s t e d on a number of i n d u s t r i a l problems. REACT



A r e a c t i o n path s y n t h e s i s program f o r the p e t r o chemical i n d u s t r y .

We are developing a program f o r use i n the petrochemical i n d u s t r y . The f e a t u r e s o f the program are

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

4. GoviND A N D P O W E R S

1.

Reaction

Path

Molecule Representation:

Synthesis

93

Connection M a t r i x

For example,

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

0 II

READ TARGET MOLECULE

c=c=c=&

COMPUTE CONNECTION TABLE

6 0 0 2 0

FIND FUNCTIONAL GROUPS AND POSITIONS

0 4 2 0 0

0 2 4 0 0

42 24

2 0 1 4 1

0 0 0 1 8 C=C

The current s i z e l i m i t i s twenty carbon

sites.

2.

R e a c t i o n Representation Substructure Transforms Primary E v a l u a t i o n on Substructure Presence Secondary E v a l u a t i o n on L o c a l Atoms, Rings, F u n c t i o n a l Groups and C o n d i t i o n s . T e r t i a r y E v a l u a t i o n on G l o b a l Atoms, Rings, F u n c t i o n a l Groups, Competing S i t e s f o r the Desired Transform, Impurity Reactions Conditions Included Solvent, C a t a l y s t , Phase, C o n c e n t r a t i o n s ) , Temperature, Pressure, Time C u r r e n t l y 300 i n d u s t r i a l l y important substructure transforms are i n the data base.

3.

Search D i r e c t i o n :

Backwards from S i n g l e Target Molecule.

4.

Search S t r a t e g i e s

a. b. c.

5.

Evaulation

a. b. c. d. e.

D e p t h - f i r s t guided by transform scoring function B r e a d t h - f i r s t (exhaustive or w i t h breadth c o n t r o l h e u r i s t i c s ) User d r i v e n by i n t e r a c t i o n through a t e l e t y p e w r i t e r F e a s i b i l i t y by transform constraints H e u r i s t i c s c o r i n g a t transform level Stoichiometry w i t h raw m a t e r i a l costs Side r e a c t i o n s User

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch004

94

COMPUTER-ASSISTED ORGANIC SYNTHESIS

6.

Input/Output Input: Free format input o f the molecule i n a 15 row χ 15 column drawing box. Input by cards or t e l e t y p e w r i t e r Output: Synthesis tree or r e a c t i o n paths on l i n e p r i n t e r or t e l e t y p e w r i t e r

7.

Program Features: FORTRAN, some l i n k e d l i s t data s t r u c t u r e s , r e a c t i o n s are subroutines, i n t e r a c t i v e or batch, 2500 l i n e s of code.

Example: e-caprolactam The development o f new r e a c t i o n paths to nylon monomer i n ­ termediates i s an a c t i v e i n d u s t r i a l r e s e a r c h area. M i l l i o n s of kilograms of these monomers are produced each year and even small f r a c t i o n a l improvements can have large economic impact. The u n c e r t a i n f u t u r e of feed stocks d e r i v e d from crude o i l has renewed i n t e r e s t i n a l t e r n a t e r e a c t i o n paths. I n a d d i t i o n , the search f o r r e a c t i o n paths w i t h lower energy costs i s i n t e n s i f y ­ i n g . The f o l l o w i n g example i l l u s t r a t e s the use of the REACT program to generate r e a c t i o n paths which lead to e-caprolactam. Over 500 paths were generated i n l e s s than 10 CPU minutes on an IBM-360/67 computer. Several o f the paths are i l l u s t r a t e d on F i g u r e 8. T h i s study was performed as p a r t of an o v e r a l l r e ­ view o f a chemical company's p o s i t i o n i n t h i s monomer area. The r e s u l t s of the study i n d i c a t e d s e v e r a l new r e s e a r c h d i r e c ­ t i o n s which the r e s e a r c h and development teams are now pursuing.

c-c-c=o

Ç-C-C-C-&

+ cc-

1

H SO 2

4

1

+

NOHSO,

C-C-C

c-c 0 Ç-C-C-C-&

C-C-C-C-& 170__ 10_atm_. Pd. C a t . , , +

3 H

c-c-c C-C-C C-C-C-C-& . * .

H0

C-C-C-C

2

C-C-C 160 *

Figure

8.

Reaction

path

10 a t n . C

-

« * » C

output from the REACT (sigma bond), * = aromatic

-

+

C

program: ring

à- =

1

'

5

0

2

—OH,

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

$ ==

Reaction Path

A N D POWERS

GOviND

Synthesis

c-c-c=o •

«

C

Ν

c-c-c=o I

+

$ '

A=C H-C-W ALDOL-COND ;REF: H.O. HOUSE, MOD. SYN. RXNS.', (1972) ρ 629. DGROUP WGROUP PATH 3 PRIORITY 100 CHARACTER BREAKS CHAIN BREAKS RING IF GROUP 1 IS ESTERX THEN KILL IF GROUP 2 IS A NITRILE THEN SUBT 20 IF AN RGROUP IS ALPHA TO ATOM 2 THEN SUBT 10 FOR EACH CONDITIONS BASIC AND NUCLEOPHILIC OR CONDITIONS ACIDIC AND NUCLEOPHILIC THEN SUBT 20 BREAK BOND 1 I F ATOM 1 IN GROUP 1 IS NITROGEN (1) THEN BEGIN IF (1) IS QUATERNARY THEN KILL ELSE I F (1) IS TERTIARY THEN ADD + TO (1) DONE MAKE BOND FROM ATOM 1 IN GROUP 1 TO ATOM 2 IN GROUP 1 IF ATOMS ALPHA TO ATOM 1 OFFPATH ARE ALL ATTACHED & TO 0 HYDROGENS THEN ADD 50 IF ATOM 1 IS A STEROCENTER (1) THEN BEGIN I F ATOM IS ALPHA TO ATOM 1 OFFPATH (2) PUT (1) OR (2) INTO (1) IF ATOM 2 IS ON MOST HINDERED SIDE OF (1) THEN SUBT 50 DONE END T

Figure

5.

Excerpts

from the Aldol

Transform

a p p l i c a t i o n w i l l l e a d to a s i m p l i f i c a t i o n of the s y n t h e t i c problem. S t r a t e g i e s and transforms are separated completely, with the r e s u l t that new transforms can be e a s i l y added without a l t e r i n g s t r a t e g i e s , y e t the s t r a t e g i e s w i l l use the new t r a n s ­ forms when a p p r o p r i a t e . This s e p a r a t i o n lends c l a r i t y and m a i n t a i n a b i l i t y to the SECS knowledge base.l? ALCHEM i s a powerful d e s c r i p t i v e language. To date we have found no r e a c t i o n that couldn't be represented. P a r t of the power of ALCHEM stems from the r i c h p e r c e p t u a l i n f o r m a t i o n base i n SECS which i s accessed by terms such as STERIC HINDRANCE, and the c a p a b i l i t y of SECS to evaluate these expressions i n terms of models. I n f o l l o w i n g s e c t i o n s we show how these expressions and models are used to achieve greater chemical accuracy and thus e f f e c t i v e l y reduce the s i z e of the s y n t h e s i s t r e e generated. 4.

SYNTHESIS TREE REDUCTION THROUGH GREATER CHEMICAL ACCURACY

4.1 STERIC EFFECTS. A chemist p r e d i c t s s t e r i c e f f e c t s by e v a l u a t i n g the p o s s i b l e approach of a reagent to a t h r e e dimensional model of the r e a c t i n g molecule. SECS does the same

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

5.

wiPKE ET AL.

SECS:

Strategy

and

Planning

109

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

t h i n g . The SYMIN module constructs a 3-D s p a c e - f i l l i n g model according to the r e q u i r e d stereochemistry using molecular mechanics.12 Any transform can invoke a s t e r i c c a l c u l a t i o n as i l l u s t r a t e d i n l i n e 18 of f i g u r e 5. Of p a r t i c u l a r importance are r e a c t i o n s which create new stereocenters by attack on a double bond. For example, i n the U c l a f synthesis of andrenosterone, i m p l i c a t i o n of the ketone precursor from the β - a l c o h o l i s v a l i d because i t a l s o implies hydride attack from the l e s s con­ gested s i d e of the carbonyl which i s p r e f e r r e d .

We developed a c a l c u l a t i o n a l model f o r t h i s type of r e a c t i o n to permit e v a l u a t i o n of s t e r i c a c c e s s i b i l i t y and i t s i n v e r s e . congestion, on any type of molecular s t r u c t u r e (see f i g . 6)A$

b Figure 6. Cone of preferred approach of R to χ allowed by atom i. The accessibility of χ on side 3L with respect to i is defined by this solid angle and numerically equals the area on a unit sphere cut by this cone (shaded area).

Using t h i s model to p r e d i c t the major product from ketone r e ­ ductions, we found p r e d i c t i o n s c o r r e l a t e w e l l with experiment (n = 34, r = 0.924). We a l s o found t h i s model a p p l i c a b l e to epoxidation and hydroboration of o l e f i n s . ^ The s t e r i c ALCHEM statement ( l i n e 18, f i g . 5) i s evaluated v i a the s t e r i c c a l c u l a ­ t i o n to determine i f the r e a c t i o n would create the r e q u i r e d 2

2

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

110

stereoisomer, and depending on the t r u t h v a l u e , the appropriate c o n d i t i o n a l phrase i s i n t e r p r e t e d . Note t h i s i s comparison of congestion between the two s i d e s of a double bond.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

Because the c a l c u l a t e d congestion i s absolute, one can a l s o make comparisons between f u n c t i o n a l groups w i t h i n a s t r u c t u r e . For example, i n Corey's l o n g i f o l e n e s y n t h e s i s i t was necessary to remove one ketone i n the presence of another as shown below. ^ Normally t h i s would r e q u i r e p r o t e c t i n g the ketone which i s to remain. Both ketones are s a t u r a t e d , but they d i f f e r i n s t e r i c environment. The t o t a l congestions c a l c u l a t e d by SECS i n d i c a t e that the ketone which i s to remain i s already p r o t e c t e d by the

Σ CONG. = 35

(4a)

Σ CONG. = 114 greater s t e r i c congestion present, hence r e a c t i o n s should occur on the a,α-dimethyl ketone s e l e c t i v e l y and t h i s was observed ex­ p e r i m e n t a l l y . This i s expressed i n ALCHEM by "IF STERIC AT GROUP 1 BETTER THAN KETONE ANYWHERE THEN..." This approach of e v a l u a t i n g s t e r i c e f f e c t s has proven e f ­ f e c t i v e - ^ on a wide v a r i e t y of molecular skeletons and i s not r e s t r i c t e d to any l i b r a r y of s p e c i a l r i n g systems. We b e l i e v e SECS s p r e d i c t i v e c a p a b i l i t y i n an unknown system compares f a v o r a b l y with the p r e d i c t i v e c a p a b i l i t y of most chemists. 1

4.2 ELECTRONIC EFFECTS. The e l e c t r o n i c p r o p e r t i e s of con­ jugated systems could be s t o r e d i n a l i b r a r y according t o r i n g system and heteroatom s u b s t i t u t i o n . Such an approach however f a i l s when faced with a s k e l e t o n not contained i n the l i b r a r y . For d i r e c t i n g e f f e c t s on benzene- one could use the Hammett equation*** or t o p o l o g i c a l r u l e s , but these cannot e a s i l y be e x t r a p o l a t e d t o p o l y c y c l i c or h e t e r o c y c l i c aromatic systems. As a more general and f l e x i b l e approach, we decided SECS should construct i t s own molecular o r b i t a l (MO) model of the conjugated

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

5.

WIPKE ET

AL.

SECS:

Strategy

and

Planning

111

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

system from which i t could derive whatever p r o p e r t i e s i t needed. The HUckel MO method was s e l e c t e d to minimize computational e f f o r t . Wtihin t h i s framework, l o c a l i z a t i o n energies20 have proved more s u c c e s s f u l than e l e c t r o n d e n s i t i e s ^ l i n p r e d i c t i n g aromatic substitution reactions. The HAREHM module recognizes s u b s t i t u e n t s on aromatic r i n g s and heteroatoms i n the r i n g s , and assigns to each of them jvparameters c h a r a c t e r i z i n g t h e i r e l e c t r o n withdrawing or donating properties. For each aromatic r i n g system HAREHM analyzes the p o s s i b l e s u b s t i t u t i o n r e a c t i o n s : f o r each s u b s t i t u e n t on the r i n g system, the s u b s t i t u e n t i s removed and the e l e c t r o n i c energy i s c a l c u l a t e d . Then f o r each f r e e p o s i t i o n on t h i s modified s t r u c t u r e the f r e e atom i s temporarily removed, breaking up the conjugated system, and the energy again c a l c u l a t e d . The d i f ference between these energies i s the l o c a l i z a t i o n f o r attack at that f r e e p o s i t i o n . This i s computed f o r anion, r a d i c a l , and c a t i o n i c intermediates when appropriate, and f o r a l l f r e e p o s i t i o n s . I f the p o s i t i o n having the lowest l o c a l i z a t i o n energy i s the same p o s i t i o n as the o r i g i n a l s u b s t i t u e n t X then i t i s c o r r e c t to i n f e r that k i n e t i c attack of X w i l l occur i n the c o r r e c t p o s i t i o n . See examples i n equations 9 and 10. HAREHM can t r e a t s t r u c t u r e s with m u l t i p l e conjugated systems. Within an aromatic s u b s t i t u t i o n transform the terms RLENERGY, NLENERGY, and ELENERGY r e f e r to the r a d i c a l , nucleop h i l i c , and e l e c t r o p h i l i c l o c a l i z a t i o n energies r e s p e c t i v e l y . Several types of statements use these terms, e.g.: IF ELENERGY ON ATOM 1 BETTER THAN ATOM 2 ELSE KILL

(5)

IF ELENERGY ON ATOM 4 BETTER THAN (1) THEN ADD

(6)

20

IF NLENERGY ON ATOM 4 BETTER THAN (1) THEN BEGIN ADÇ 40 CONDITIONS NUCLEOPHILIC AND BASIC DONE ELSE BEGIN IF ELENERGY ON ATOM 4 BETTER THAN (1) BEGIN ADD 20 CONDITIONS ACIDIC AND ELECTROPHILIC DONE DONE

(7)

THEN

IF ELENERGY ON ATOM 4 BETTER THAN (1) THEN CORRECT

(8)

Statement 5 shows comparison of e l e c t r o p h i l i c l o c a l i z a t i o n energy (LE) on 2 atoms, whereas i n statement 6 the LE of atom 4 i s compared to the LE's of each atom i n the set of atoms ( 1 ) . The compound statement 7 s e l e c t s n u c l e o p h i l i c conditions i f p o s s i b l e , otherwise e l e c t r o p h i l i c c o n d i t i o n s . In statement 8, "CORRECT"

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

112

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

adjusts the p r i o r i t y upward or downward according to whether the LE i s lower or higher than the LE of benzene. The magnitude of change i s p r o p o r t i o n a l to the d i f f e r e n c e between the two LE's. Factors such as s t e r i c e f f e c t s on thermodynamics are described by other statements i n the transforms. The examples below i l l u s t r a t e c o n t r o l by e l e c t r o n i c e f f e c t s . The numbers i n equation 10 show c a l c u l a t e d LE v a l u e s .

The accuracy of the HMO method decreases as the number of heteroatoms i n the r i n g system i n c r e a s e s , but the same can be s a i d f o r p r e d i c t i o n s by chemists i n complex cases. The HMO method has s a t i s f i e d our need f o r a r a p i d , f l e x i b l e , general approach. 4.3 RECOGNITION OF INTERFERING FUNCTIONALITY AND PROTECTING GROUPS. An important p a r t of e v a l u a t i n g the a p p l i c a b i l i t y of a given transform i s determining whether the c o n d i t i o n s r e q u i r e d f o r the transform w i l l a l s o cause unwanted r e a c t i o n s elsewhere i n the molecule. T h i s i s i n f a c t an important s e l e c t i o n r u l e which helps f u r t h e r reduce the p o t e n t i a l s o l u t i o n space and prevent some p o s s i b l e orderings of transforms. However, i t i s too harsh simply to k i l l any transform i n which there i s an i n t e r f e r e n c e when that i n t e r f e r e n c e could be avoided by modifying the group. We have also found that i t i s d e s i r e able f o r the user to be n o t i f i e d of such i n t e r f e r e n c e s , because he may not know the exact c o n d i t i o n s and can not evaluate i n t e r ferences e a s i l y simply l o o k i n g at the p r e c u r s o r g i v e n the 9

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

5.

wiPKE ET A L .

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

name of a

SECS:

Strategy

and

113

Planning

transform.

Thus, i n 1973 we incorporated p r o t e c t i n g groups i n t o SECS 2.0, increased the number of f u n c t i o n a l groups recognized, and d i f f e r e n t i a t e d between f u n c t i o n a l group environments. (See Table I f o r a l i s t i n g of the s e n s i t i v i t i e s of d i f f e r e n t i a t e d f u n c t i o n a l groups to d i f f e r e n t r e a c t i o n conditions.) Each f u n c t i o n a l group i s c l a s s i f i e d i n t o e x a c t l y one category which best describes i t . The d e f i n i t i o n of r e a c t i o n conditions was a l s o changed i n SECS to cover more types of c o n d i t i o n s . There are many ways to desc r i b e r e a c t i o n c o n d i t i o n s , one can describe p h y s i c a l parameters such as temperature, time, pressure, a c i d i t y , redox p o t e n t i a l , p o l a r i t y , e t c . , and l e t a given r e a c t i o n be represented by a sum of such parameters; or one can d e f i n e reagent prototypes; or one can d e f i n e equivalences between reagent prototypes and p h y s i c a l parameters. The current categories i n SECS 2.4 are shown i n Table I I . Each category i s subdivided i n t o s l i g h t , normal, and strong, corresponding to the continuous p h y s i c a l p r o p e r t i e s (represented i n Table I by -, = and * r e s p e c t i v e l y ) , but f o r some chemical categories the s u b d i v i s i o n s represent d i f f e r e n t r e a c t i o n types. A group s e n s i t i v e to s l i g h t l y a c i d i c conditions i s u s u a l l y more s e n s i t i v e to s t r o n g l y a c i d i c condit i o n s , but a group s e n s i t i v e to s l i g h t l y halogenating c o n d i t i o n s may be s t a b l e to s t r o n g l y halogenating conditions because the two halogenations are m e c h a n i s t i c a l l y completely d i f f e r e n t . A l l r e a c t i o n conditions used i n a r e a c t i o n up to the next workup (one pot) are s p e c i f i e d i n one CONDITION statement i n ALCHEM. Thus i f there i s more than one step (workup) i n a transform, there would be separate c o n d i t i o n statements f o r each s e t of simultaneous c o n d i t i o n s . The reason i s that a f u n c t i o n a l group might be s t a b l e to every s i n g l e r e a c t i o n step, but not to an attack of a l l conditions at the same time. The current v e r s i o n takes the c o n d i t i o n categories as independent and a d d i t i v e , so one s p e c i f i e s a l l the pure p h y s i c a l p r o p e r t i e s (pH, temp., solvent) and only the r e l e v a n t chemical p r o p e r t i e s . The b a s i c chemical information used f o r f u n c t i o n a l groups, p r o t e c t i v e groups, s e n s i t i v i t i e s and r e a c t i o n c o n d i t i o n s i s represented by binary s e t s , and manipulated with l o g i c a l s e t operat i o n s , AND, OR, XOR, and SUBSET. A binary s e t containing the current f u n c t i o n a l groups i s e s t a b l i s h e d during p e r c e p t i o n of the t a r g e t : 3

GROUPS

1001000000

(group types)

Every b i t p o s i t i o n corresponds to one of the categories of funct i o n a l group, 1 means present, 0 means absent. Every transform contains a c o n d i t i o n statement. In the transform corresponding to the s y n t h e t i c r e d u c t i o n of a ketone shown below, the condit i o n s are a l t e r n a t i v e s from which only one s e t of conditions i s

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

CARBOXYLIC ACID DECARBOXYLASE ACID ACID ANHYDRIDE ACID HAL I DE PHENOL ENOL BENZYL,ALLYL ALCOHO PRIM ALCOHOL SEC ALCOHOL TERT ALCOHOL AROM ALDEHYDE A, Β UN S AT ALDEHYDE CH 3,R CH2 ALDEHYDE R2 CH2 ALDEHYDE R3C ALDEHYDE PRIM AMIDE SEC AMIDE TERT AMIDE AROM AMINE ENAMINE BENZYL,ALLYL AMINE PRIM AMINE SEC AMINE TERT AMINE C=C AROMATIC C=C CONJ TO C=C C=C CONJ TO W-GROUP C=C ISOLATED CH3,RCH2 ESTER

Table I

* * * =

*

*

= *

*

* * *

=* * =

*

*

* =*

= * *

=* * * =*

-=* -=*

ACI

=* -=* -=*

BAS

=

* * * *

*

*

= *

* *

=* =*

=* =*

* =*

* * =* =*

RED

*

-=* * * * =* =* *

*

*

* * = * *

-=* -=* -=*

-=*

0X1

_

=

=

=

*

*

=*

HOT

*

=* * -=* -=* * -=* -=*

-=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* -=* =*

OMT

* =* =*

* * * *

PHO

* *

SOL

=* =*

=* *

HAL

*

* *

*

* * =* =*

NUC

=* =* = * = * _ = =* =*

=* =*

=* =* =* =* =*

=*

-=*

HYD

S e n s i t i v i t i e s of F u n c t i o n a l Groups to Reaction Conditions

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

* *

*

-=* -=*

=* =*

=*

=* =* =* =* =* =* * * * * *

=* =*

ELE

g g g "

w

g > | H α § £ 2

α

g £

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

R2 CH ESTER OTHER ESTER Α-HALOGEN ETHER PHENOL ETHER ENOL ETHER BENZYL, ALLYL ETHER ALKYL ETHER ARYL HAL I DE VINYL HALIDE BENZYL, ALLYL HALIDE PRIM HALIDE SEC HALIDE TERI HALIDE ARYL KETONE A,Β UNSAT KETONE CH3,RCH2 KETONE R2 CH KETONE R3C KETONE QUINONE NITRO NI TRILLE ALKYNE-H ALKYNE-R EPOXIDE CYCLOPROPANE PHENOL,ENOL ESTER-X BENZYL,ALLYL ESTER-X TERT ESTER-X SEC,PRIM ESTER-X IMINE

IMINE-Z

30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

60

=* * -=* -=* * =* =*

=*

*

* *

=* * * =* * -=* -=* * =* =*

*

-=* * * -=*

*

*

-=* * * -=*

*

*

* * *

ACI

BAS

=

*

* *

*

=*

*

ΟΧΙ

*

*

*

*

* *

*

*

=* =* =* =* * -=* -=* -=*

=

RED

*

*

* =*

* *

*

*

=*

*

=* * =* -=* =* =* =* =* * =* -=* * = * * * =* =* * =* -=* * * =*

=*

=*

=* -=* * -=* -=* =* *

* =* * * *

=*

=*

=*

=*

=*

HYD

*

SOL

=*

=*

PHO

=* *

*

*

OMT

*

=

=

HOT

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

=*

=* =* * *

* *

HAL

* =*

=* =*

=*

*

=*

= =

-=* -=* -=* -=* =

* *

*

NUC

* *

* *

* * * * *

*

*

ELE

Κ

ι—

^ !" S. c§

^ t*i 3 ^ 3" | §§ &

w w H > Γ"

g

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

116

COMPUTER-ASSISTED

STRONGLY BASIC SLIGHTLY

pH(pKa)>18 12 - 18 8-12

SLIGHTLY ACIDIC STRONGLY

2 - 6 -2 - +2 < -2

STRONGLY OXIDIZING SLIGHTLY

KMNOi*, CR03 JONES REAGENT MN02, MILD SELECTIVE

SLIGHTLY REDUCING STRONGLY

ZN/H+, MILD SELECTIVE NABHi+, ALKYL HYDRIDES LIALHI+

STRONGLY COLD SLIGHTLY

-100 -19

SLIGHTLY HOT STRONGLY

30 101

ORGANIC SYNTHESIS

Table I I D e f i n i t i o n s of Reaction Conditions

< -100 - -20 - +10 >

100 200 200

SLIGHTLY ORGANOMETALLIC STRONGLY

ORGANOCADMIUM AND ZINC ORGANOMAGNE SIUM ORGANOLITHIUM

SLIGHTLY PHOTOCHEMICAL STRONGLY

UNSENSITIZED 350 - 700 NM UNSENSITIZED 250 - 700 NM SENSITIZED

PROTIC APROTIC SLIGHTLY HYDROGENATING STRONGLY

DEACTIVATED CATALYSTS MILD CATALYSTS STRONG CATALYSTS OR/AND HIGH PRESSURE

SLIGHTLY HALOGENATING STRONGLY

N-BROMOSUCCINIMID IN CCLi+ HALOGEN WITHOUT CATALYST OR R-X/SNCLi+ HALOGEN WITH ACID OR BASE CATALYSIS

SLIGHTLY NUCLEOPHILIC STRONGLY

SUBSTITUTION AT SATURATED CARBON AND MICHAEL ADDITION AT CARBONYLS SUBSTITUTION AT CARBOXYLS (ADDITION-ELIMINATION)

SLIGHTLY ELECTROPHILIC STRONGLY

ALKYLATION WITH R-X ACYLATION WITH RCOX, (RCO) 0, TSCL LEWIS ACID CATALYZED ALKYLATION OR ACYLATION 2

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

5.

WIPKE E T A L .

SECS:

Strategy

and

117

Planning

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

selected. CH2 => C=0 WOLFF-KISHNER, THIOKETAL DESULFURIZATION, OR CLEMMENSON REDUCTION OF A KETONE KETHYDROGENOLYSIS KETONE PATH 1 PRIORITY 50 CHARACTER INTRODUCES GROUP IF ATOM 1 IS NOTE ATTACHED TO 2 HYDROGENS THEN KILL IF HETEROATOM IS ALPHA TO ATOM 1 THEN KILL IF ATOM 1 IS ALLYLIC THEN SUBT 70 CONDITIONS STRONGLY BASIC AND REDUCING OR CONDITIONS ACIDIC AND REDUCING OR CONDITIONS SLIGHTLY HYDROGENATING AND NUCLEOPHILIC ADD 0 OF ORDER 2 TO ATOM 1 END CONDIT compares the s e t of conditions f o r the r e a c t i o n (or r e a c t i o n step) with the s e t of s e n s i t i v i t i e s of each group to those c o n d i t i o n s , leading to a s e t of i n t e r f e r i n g groups (IGRPS) (acid i n t e r f e r e s ) : SENS (acid) SENS (ketone)

1000100010 0010001000

(condition

types)

COND

0001100000

(condition

types)

GROUPS IGRPS

1001000001 0001000000

(group types)

The program now looks f o r p r o t e c t i v e groups based on the f o l lowing information: NAME PROGR STABIL INTRO REMOV DEDUC

2-0XAZ0LINE 0001100000 1011010001 0000000001 0100000000 35

(group types) ( c o n d i t i o n types) ( c o n d i t i o n types) ( c o n d i t i o n types) ( p r i o r i t y change)

The group-set PROGR i n d i c a t e s the group types that can be protected by t h i s p a r t i c u l a r p r o t e c t i v e group, STABIL contains the r e a c t i o n conditions to which the p r o t e c t i v e group i s s t a b l e . INTRO and REMOV contain the r e a c t i o n conditions needed f o r i n t r o d u c t i o n and removal of the p r o t e c t i v e group. DEDUC i s a measure i n p r i o r i t y value u n i t s of the expenditure needed f o r an a p p l i c a t i o n of that c e r t a i n p r o t e c t i v e group, based on the number of steps, losses i n y i e l d , and d i f f i c u l t y i n experimental procedure. The algorithm f o r choosing the proper p r o t e c t i v e group i s based on the f o l l o w i n g c r i t e r i a :

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

118 a) b) c) d)

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

e)

the PG must f i t the FG the PG must be s t a b l e to r e a c t i o n conditions ( a l l steps) the PG should simultaneously p r o t e c t as many as p o s s i b l e of the i n t e r f e r i n g groups the PG should p r o t e c t none or as few as p o s s i b l e of the n o n - i n t e r f e r i n g groups the p r i o r i t y l o s s should be minimized

I f there remain any unprotected i n t e r f e r i n g groups, an a d d i t i o n a l penalty i s imposed which may k i l l the transform i f the p r i o r i t y c u t o f f i s high enough. I f the transform was s u c c e s s f u l i n a l l p a r t s , the names of the used p r o t e c t i v e groups are s t o r e d with the manipulation i n s t r u c t i o n s f o r the generation of the precursor from the current t a r g e t . The atoms bearing a p r o t e c t i v e group are marked i n the connection t a b l e . When the precursor i s d i s played a (P) appears near the s t a r t i n g atom of the f u n c t i o n a l group along with the name of the s p e c i f i c p r o t e c t i n g group (the name can be suppressed by the u s e r ) . (See equation 4a.) The p r o t e c t i n g groups are a u t o m a t i c a l l y removed from the precursor when i t i s processed as the current t a r g e t . Thus, the user i s n o t i f i e d g r a p h i c a l l y of the f a c t that i n t h i s given transform c e r t a i n groups must be p r o t e c t e d , but the d e c i s i o n of when to put them on or take them o f f must be done a f t e r the e n t i r e sequence has been generated. At that time one would p o s s i b l y s e l e c t a d i f f e r e n t group which s a t i s f i e s the needs of s e v e r a l transforms and minimizes operations. Work i s under way to optimize p r o t e c t i o n f o r sequences of r e a c t i o n s and to generate subgoals f o r f u n c t i o n a l group i n t e r c o n v e r s i o n f o r p a r t i c u l a r l y troublesome groups, using p r i n c i p l e s of l a t e n t and equivalent groups. Mutually r e a c t i v e f u n c t i o n a l group combinations l i k e a c i d h a l i d e and a l c o h o l i n the same precursor are recognized i n the precursor e v a l u a t i o n r o u t i n e . The user s p e c i f i e s to EVAL whether such precursors should be d e l e t e d . 5.

TREE REDUCTION THROUGH SYMMETRY

In the absence of symmetry information, a p p l i c a t i o n of a transform to a s t r u c t u r e possessing s e v e r a l elements of symmetry produces redundant p r e c u r s o r s . The precursors must be created, the redundancies recognized by graph-matching or c a n o n i c a l naming, and f i n a l l y the d u p l i c a t e precursors must be d e l e t e d . With symmetry information, i t i s p o s s i b l e to avoid generating such redundant p r e c u r s o r s . SECS perceives the complete s t e r e o chemical graph isomorphism groups of the target s t r u c t u r e and uses t h i s symmetry group i n applying a transform so that the transform i s a p p l i e d i n a l l p o s s i b l e unique ways without r e dundancy. The synthesis of a s t e r a n e ( f i g u r e 7) i l l u s t r a t e s how r e c o g n i t i o n of molecular symmetry reduces the number of 11

2 5

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

5.

wiPKE ET AL.

pathways

SECS:

and

Planning

119

generated.

Figure

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

Strategy

7.

Redundancy

from molecular

symmetry

In a d d i t i o n to u t i l i z i n g molecular symmetry, i t i s a l s o necessary to u t i l i z e transform symmetry where i t e x i s t s . A transform possesses a symmetry element i f the SUBSTRUCTURE possesses the element and i t i s preserved by the MANIPULATION and SCOPE/LIMITATIONS statements, e.g., the D i e l s A l d e r transform:

3

This symmetry i s represented i n the s u b s t r u c t u r e , "C=C-C-C-C-C@1/", by the operator enclosed w i t h i n angle brackets and prevents SECS from f l i p p i n g the SUBSTRUCTURE over on the target and applying the transform again, which would produce an i d e n t i c a l precursor. SECS a l s o uses the stereochemical graph isomorphism group w i t h i n a transform to c o r r e c t l y determine i f atoms or groups are chemically e q u i v a l e n t : "IF ATOM 1 AND ATOM 2 ARE EQUIVALENT THEN... Complete d e t a i l s of the symmetry r e c o g n i t i o n algorithm and i t s a p p l i c a t i o n are given e l s e w h e r e . ^ 11

6.

TREE REDUCTION THROUGH PRUNING

Precursors are screened independently by the EVAL module to e l i m i n a t e i l l e g a l s t r u c t u r e s which might have been created by a f a u l t y transform, and s t r u c t u r e s which are unstable or otherwise undesireable. The e v a l u a t i o n c r i t e r i a l i s t e d below can be enabled or d i s a b l e d by the user through the EVAL command. 1. 2. 3. 4. 5.

Bredt r u l e v i o l a t i o n Antiaromatic r i n g Cumulene T r i p l e bond i n s m a l l r i n g Trans o l e f i n i n s m a l l r i n g

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

120 6. 7.

8. 9. 10. 11.

ORGANIC SYNTHESIS

Trans bridged r i n g system Trans fused 3-membered r i n g Valency v i o l a t i o n D i - i o n of same charge Four-membered r i n g Unstable group combinations

The previous s e c t i o n s d e a l t with reducing the s i z e of synt h e s i s tree generated by i n c r e a s i n g the accuracy of p r e d i c t i o n s , i . e . , by making transforms more s e l e c t i v e i n t h e i r output. We now turn our d i s c u s s i o n to how the program s e l e c t s which t r a n s forms to apply.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

7. CONTROLLING TREE GROWTH THROUGH STRATEGIES AND PLANS FOR TRANSFORM SELECTION The o b j e c t i v e of the SECS program i s to f i n d sequences of transforms which not only are chemically accurate, but a l s o lead to "simpler" s t r u c t u r e s and form an " e f f i c i e n t " s y n t h e s i s . We have f o r many reasons s p e c i f i c a l l y avoided b i a s i n g SECS with preprogrammed sequences or t h e i r equivalent binary d e c i s i o n t r e e s . Such b i n a r y d e c i s i o n trees produce r i g i d orderings of transforms, must be updated as new transforms are discovered, and are n e c e s s a r i l y incomplete s i n c e they are manually programmed. Instead, our approach i s to have SECS dynamically generate i t s own s y n t h e t i c sequences guided by i t s current s t r a t e g i e s using i t s current l i b r a r y of transforms. Strategies are high l e v e l p r i n c i p l e s of molecular c o n s t r u c t i o n and m o d i f i c a t i o n , and are independent of the transform l i b r a r y . When a strategy i s a p p l i e d to a t a r g e t s t r u c t u r e , goals are generated which describe the s t r u c t u r a l changes necessary to implement the s t r a t e g y . Goals are a l s o independent of the transform l i b r a r y . The n u l l s t r a t e g y generates no goals, hence s e l e c t s no t r a n s forms . 7.1 THE OPPORTUNISTIC STRATEGY i s the opposite of the n u l l s t r a t e g y , f o r i t s e l e c t s every transform which " f i t s " the t a r g e t , i . e . , s e l e c t i o n by a p p l i c a b i l i t y , regardless of whether the precursor i s simpler than the t a r g e t . This generates a l l " l e g a l moves" i n chess terminology. A major problem with t h i s s t r a t e g y i s that the l a r g e c l a s s of f u n c t i o n a l group interchange (FGI) and f u n c t i o n group i n t r o d u c t i o n (INTRO) transforms, are n e a r l y always a p p l i c a b l e and are thus f r e q u e n t l y a p p l i e d , but do not normally lead to simpler p r e c u r s o r s . F G I s are the equivalent of the s u b s t i t u t i o n operators i n GPS. 1

27

7.2 GOAL-DIRECTED SELECTION of transforms operates q u i t e d i f f e r e n t l y — t h e f i r s t c o n s i d e r a t i o n becomes "does t h i s t r a n s form achieve one of my goals?", i . e . , i f i t were a p p l i e d would i t be a move i n the d e s i r e d d i r e c t i o n ? Goals define the " d e s i r e d

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

5.

wiPKE E T A L .

SECS:

Strategy

and

"Planning

121

11

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch005

direction. A goal i s the d i f f e r e n c e between the current t a r g e t s t r u c t u r e and the d e s i r e d s t r u c t u r e or d e s i r e d s u b s t r u c t u r e . Transforms are then f i r s t s e l e c t e d as to whether they are r e l e v a n t to the g o a l ( s ) , and l a t e r are examined f o r a p p l i c a bility. Goals provide a sense of d i r e c t i o n and a j u s t i f i c a t i o n f o r expending computational resources. This enables a s e l e c t i v e e x p l o r a t i o n of the space of syntheses, and promises a higher p r o p o r t i o n of e f f i c i e n t syntheses among the s o l u t i o n s generated, the t r a d e o f f being l o s s of completeness and p o s s i b l e l o s s of s o l u t i o n s . However, with good goals t h i s l o s s i s n e g l i g i b l e . As the number of chemical transforms a v a i l a b l e to the program i n c r e a s e s , the b e n e f i t s of g o a l - d i r e c t e d s e l e c t i o n of transforms become even more obvious. 7.3 GOALS ARE CREATED BY STRATEGIES. An important o b j e c t i v e of the SECS p r o j e c t i s to determine what c o n s t i t u t e good s t r a t e g i e s f o r s y n t h e s i s . Many s t r a t e g i e s are s e l f evident but others are yet to be discovered. Our fundamental aim i s to obtain simple p r e c u r s o r s . Network o r i e n t e d s t r a t e g i e s focus on s i m p l i f i c a t i o n of the molecular s k e l e t o n , i n p a r t i c u l a r the r i n g systems. Fused r i n g systems are considered simpler than bridged or s p i r o systems owing to t h e i r greater ease of p r e p a r a t i o n . A good s y n t h e t i c s t r a t e g y i s t h e r e f o r e to create the complex r i n g system from a simple fused system. T r a n s l a t e d i n t o the backward d i r e c t i o n , t h i s means t r y to break those bonds which l e a d to fused or a c y c l i c r i n g systems with a minimum of appendages. The f i r s t statement of t h i s s t r a t e g y appeared i n Corey's l o n g i f o l e n e synthesis. Thus bonds 1-7, and 3-4 are s t r a t e g i c , but so i s 2 5

the p a i r of bonds {3-4,1-2}. In place of a set of s t r a t e g i c bonds used by other programs, SECS represents t h i s s t r a t e g y as a l o g i c a l l y s t r u c t u r e d goal l i s t shown above. Other s t r a t e g i e s focus on s i m p l i f y i n g the skeleton by r e connecting appendages to form a new r i n g or by j o i n i n g opposite s i d e s of l a r g e r i n g s to form common r i n g s . These s t r a t e g i e s

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

122

COMPUTER-ASSISTED ORGANIC SYNTHESIS

produce goals such as "MAKE 1-6". Note t h i s goal can not be r e presented as a s e t o f bonds s i n c e the bond 1-6 does not y e t exist. Symmetry based s t r a t e g i e s are i n d i c a t e d i n the syntheses of 3-carotene which constructed bonds and ID,28 ^ and Action pairs. A chemist can interactively m o d i f y this rule set. F u r t h e r , an advanced s e a r c h model is p r e s e n t e d w h i c h , by i n t r o d u c i n g t h e powerful concept o f a P l a n n i n g Space, a l l o w s t h e s e a r c h f o r s y n t h e s e s t o go f o r w a r d , backward and t o leap i n t o t h e m i d d l e under controlled conditions.

148 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

149

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

INTRODUCTION Planning chemical synthesis routes for known molecular s t r u c t u r e s i s a r i c h problem area o f f e r i n g a challenge that i s being met in inventive and imaginative ways not only by chemists, but a l s o by computer s c i e n t i s t s and mathematicians. I b r i n g to t h i s task the perspective of a s p e c i a l i s t i n the methods of developing A r t i f i c i a l I n t e l l i g e n c e w i t h i n a computing system, and a p e r s i s t e n t concern for using the mechanisable aspects of human knowledge and human problem s o l v i n g techniques i n the medium of a machine. I b r i n g to t h i s task no e x p e r t i s e i n organic chemistry or i n s y n t h e s i s . But I do have the b e n e f i t of several years of intimate contact with the problem of mechanizing the search for syntheses and with expert chemists, e s p e c i a l l y Prof. W.F. Fowler, who foresaw such p o s s i b i l i t i e s and who worked with us. My a p p l i c a t i o n to t h i s task under the guidance of Prof. Gelernter culminated during 1971 i n the running v e r s i o n of SYNCHEM I , the f i r s t computer program to perform s u c c e s s f u l m u l t i - s t e p synthesis e x p l o r a t i o n s automatically without on-line guidance or intervention. This f i r s t v e r s i o n of SYNCHEM employed several key ideas and techniques that were discovered by others e a r l y i n the research on mechanical problem s o l v i n g . These key ideas w i l l be reviewed below. Since my main i n t e r e s t l i e s i n the d i r e c t i o n of artificial i n t e l l i g e n c e , I have spent the f i v e years f o l l o w i n g my work i n SYNCHEM i n working on the a p p l i c a t i o n of a r t i f i c i a l i n t e l l i g e n c e methods to problems i n Mass Spectrometry with the H e u r i s t i c Dendral p r o j e c t at Stanford U n i v e r s i t y and a l s o with Professor Ivar Ugi at Munich i n a s s i m i l a t i n g h i s a l g e b r a i c approach to the representation of r e a c t i o n s . Upon Todd Wipke's i n v i t a t i o n to p a r t i c i p a t e i n t h i s Symposium I have e l e c t e d to set f o r t h i n a very s p e c i f i c manner my proposals on how I would t a c k l e the task of synthesis planning i n the l i g h t of my current understanding of the advances that have been made i n the methods of a r t i f i c i a l i n t e l l i g e n c e . Thus, I have three aims i n w r i t i n g t h i s paper:

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

150

COMPUTER-ASSISTED

ORGANIC

SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

a) Propose a system to augment e x i s t i n g synthesis search systems by focusing on the issues of search management. To t h i s end the concept o f search modelling i s introduced and two search models are presented. b) C l a r i f y the f i n e d i s t i n c t i o n between s e l e c t i o n of transform by Relevance criteria and by Applicability criteria. S e l e c t i o n by A p p l i c a b i l i t y defines what i s u s u a l l y c a l l e d the State Sgace and search i n the s t a t e space u s u a l l y grows the synthesis path uniformly from one end only. S e l e c t i o n by Relevance i s not used by any e x i s t i n g system which when used y i e l d s the powerful Planning Space. The search i n a planning space can take "leaps" along the synthesis sequence. c) I n d i c a t e that a combination of search i n both the s t a t e space and the planning space i s p o s s i b l e and that t h i s a f u n c t i o n of the search model employed. The second search model described i n the paper allows the search for synthesis to go forward, backward and to leap into the middle under c o n t r o l l e d c o n d i t i o n s . I t i s hoped that the advantages of t h i s way o f developing a system w i l l include: upgrading the r o l e of the chemist from one of r a t i n g , pruning and s e l e c t i n g subgoals or precursors to that of g i v i n g i n j u n c t i o n s to the system i n the form of r u l e s added, removed or modified as the search proceeds a few steps at a time; the i n t r o d u c t i o n o f the s t r a t e g i c and judgmental knowledge i n the program i n a manner that decoupled from the knowledge o f r e a c t i o n chemistry; and the a b i l i t y t o experiment e a s i l y with v a r i o u s models of search management that are q u a l i t a t i v e l y d i f f e r e n t from each other. The payoffs foreseen are of genuine importance to those of us concerned with the techniques o f a r t i f i c i a l i n t e l l i g e n c e and when proven p r a c t i c a b l e should be e x c i t i n g and c h a l l e n g i n g to chemists as w e l l . I t must be stated at the outset that these new ideas on search modelling are an outgrowth of my attempts to develop a system for common sense reasoning about human a c t i o n s using a p s y c h o l o g i c a l theory of a c t interpretation [Schmidt, 1976; Sridharan, 1975; 1976]. This system has c a p a b i l i t i e s

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRIDHARAN

Chemical

Synthesis

Planning

by

Computer

151

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

of s e l e c t i n g and applying transforms associated with act names, s t r u c t u r i n g them i n t o a plan and reasoning both forward and backward along the plan s t r u c t u r e . The strategy of act i n t e r p r e t a t i o n i s coded i n the form of r u l e s e t s . The framework adapted for t h i s work is called Meta D e s c r i p t i o n System (MDS) [ S r i n i v a s a n , 1973; 1976], designed and developed by S r i n i v a s a n . I s h a l l explore here i n d e t a i l the use of t h i s framework i n the management of search for chemical s y n t h e s i s . The connections between the p s y c h o l o g i c a l theory and synthesis planning s t r a t e g i e s w i l l be l e f t for l a t e r e x p o s i t i o n . There are three major conceptual approaches [See Sridharan, 1974 f o r a review] that have been taken on the issue of a computer mediated design of chemical synthesis plans and they share a c e n t r a l idea among them - that of searching a space of p o s s i b i l i t i e s i n a systematic manner using e m p i r i c a l knowledge as appropriate. The very power of these approaches stems from the s y s t e m a t i z a t i o n of search of the space of possibilities. For those approaching synthesis as fertile grounds for designing and b u i l d i n g computer programs that solve d i f f i c u l t i n t e l l e c t u a l problems [Sridharan, 1971, 1973, 1974; G e l e r n t e r , 1973, 1976] the main problems are twofold. F i r s t , the a c q u i r i n g and packaging of knowledge of the r e a c t i o n s i n a form s u i t a b l e for use w i t h i n a program has to be done c a r e f u l l y and the success of the program depends upon the correctness and extent of the knowledge base. Second, the techniques of conducting search with an incomplete, uncertain and possibly inconsistent knowledge base have to be customized to the task of chemical s y n t h e s i s . The interactive problem solving concepts developed [Corey, 1969; Wipke, 1973, 1976] are a t t r a c t i v e because of t h e i r thoroughness and the chemist user tends to approach the system with hopes that the d i s c i p l i n e d e x p l o r a t i o n of pathways w i l l ensure him that he has not overlooked any reasonable alternative.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

152

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

The t h i r d approach taken t o computer methods i n chemical synthesis i s one of f o r m a l i z i n g the set of p o s s i b l e r e a c t i o n s [Ugi, 1976], l i m i t i n g the use of e m p i r i c a l knowledge to the s e l e c t i o n r u l e s for these reactions. This method seeks out novel reaction schemes and can suggest to the chemist routes that could not be found by the other methods. However, a reasonable synthesis program i s yet to emerge from t h i s approach. The a l g e b r a i c c h a r a c t e r i z a t i o n of the search space however o f f e r s the p o t e n t i a l of h i g h l y i n t e r e s t i n g s t r u c t u r e s to be defined on the search space and might be very successful i n the long run. A l l of these methods c u r r e n t l y u t i l i z e a structure generally c a l l e d the H e u r i s t i c Method. THE

search Search

HEURISTIC SEARCH METHOD

The H e u r i s t i c Search Method [ N i l s s o n , 1971; Newell & Simon, 1972] i s a usual f i r s t approach t o problem s o l v i n g i f the s p e c i f i c a t i o n of the problem i t s e l f i s given p r e c i s e l y as a Goal S i t u a t i o n , and the s o l u t i o n required i s some sequence of Transforms i . e . Operators that can e f f e c t i v e l y transform the current s i t u a t i o n to the goal s i t u a t i o n . I t i s important that the allowable operators are f i n i t e and are w e l l - d e f i n e d . The problem s o l v i n g procedure involves the use of a Search Model that i s used i n guiding the search for a s o l u t i o n . The h e u r i s t i c nature of the procedure a r i s e s from the use of approximate methods of evaluating progress towards a s o l u t i o n and of assessing the merit and p o t e n t i a l of any p a r t i a l s o l u t i o n sequence. The user gives up in p r i n c i p l e the completeness and o p t i m a l i t y o f the search process gaining i n f a c t more frequent demonstrations o f s u c c e s s f u l problem s o l v i n g [Newell & Simon, 1972]. In chemical synthesis the goal s t a t e i s the target molecular s t r u c t u r e and the operators t o consider for c o n s t r u c t i n g a s o l u t i o n sequence are molecular r e a c t i o n s . The search for a synthesis plan proceeds backwards from the target molecular s t r u c t u r e by considering and applying r e a c t i o n s i n the r e t r o - s y n t h e t i c d i r e c t i o n . The process i s a s e r i a l one and has an "inner loop" that repeatedly asks

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical Synthesis Planning by Computer

153

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

i t s e l f the questions "Should the process stop now?" and "What next?". The process can be s u c c e s s f u l , i n t e r e s t i n g and powerful depending on the use of a proper Search Model to answer these questions. The answer to the question "What next?" comes i n two p a r t s : the choice of a molecular s t r u c t u r e that can be s e t up as a subgoal to complete the s y n t h e s i s , and the choice of the operator to t r y on that subgoal. The information a v a i l a b l e to the process i n a r r i v i n g at the answers i s of two kinds, both of which must be included i n the search model. The first kind comprises the catalog of s t a r t i n g m a t e r i a l s , the l i b r a r y of r e a c t i o n s , t e s t s of a p p l i c a b i l i t y of the r e a c t i o n s , the a p r i o r i merit r a t i n g f o r the goodness ( y i e l d , s p e c i f i c T t y etc.) o f the r e a c t i o n and includes i n general information that i s made a v a i l a b l e to the system p r i o r to the a c t u a l statement of the problem to solve. The nature ancf extent oT t h i s pr i o r information determines the s t r u c t u r e of the search possibilities. The other kind includes information that becomes a v a i l a b l e as the search proceeds and constitutes a r i c h body o f information that i s s p e c i f i c to the problem at hand. Only limited examples of the use of the second kind of information can be shown i n current programs, perhaps the c l e a r e s t one i s the placement of p r o t e c t i o n r e a c t i o n s f o r s e n s i t i v e f u n c t i o n a l groups. The manner i n which such problem s p e c i f i c Information i s c o l l e c t e d , organized and used t o guide search determines the character of the a c t u a l search performed for a given problem. THE PROBLEM SOLVING GRAPH MODEL.

AS

A

MINIMAL SEARCH

The Problem S o l v i n g Graph (PSG) [Gelernter, 1962] i s a g r a p h i c a l model of the dynamics of the search that i s conducted on a given problem, and c o n s i s t s of a root node representing the target molecule l i n k e d to a cascade of descendants which are one-level precursors to those at a l e v e l higher i n the PSG. The H e u r i s t i c Evaluation function assigns numerical weights (or i n some cases elements of an ad hoc s c a l e of d i s c r e t e values) t o each node i n the PSG. T y p i c a l l y , the node evaluation i s based not only on p r o p e r t i e s of the s i t u a t i o n designated by the node and

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

154

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Q

NODE

A N D - 0 R P R O B L E M SOLVING GRAPH MODEL Figure

1.

And-or

problem

solving

graph

model

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

155

Computer

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

the operator a p p l i e d to generate the node, but a l s o on the e n t i r e path from the goal node to the node i n question. The PSG model i n combination with the h e u r i s t i c value assignments provides the search mechanism ready answers to the "What next?" question. The s t r u c t u r e of the PSG f o r the chemical synthesis problem, shown i n Figure 1, c o n s i s t s of AND-nodes when operators generate m u l t i p l e precursors a l l of which need to be made a v a i l a b l e f o r the r e a c t i o n to be s u c c e s s f u l l y executed, and OR-nodes designating the choice a v a i l a b l e among s e v e r a l operators a p p l i c a b l e to a node. The selection c r i t e r i a that are w e l l entrenched i n the Theory of H e u r i s t i c Search [ S l a g l e , 1971] f o r handling such AND-OR problem s o l v i n g graphs are: At an OR-node s e l e c t the most promising subgoal; At an AND-node s e l e c t subgoals s t a r t i n g from the most l i k e l y to f a i l to the l e a s t l i k e l y to f a i l . INFORMATION-GATHERING HEURISTIC SEARCH.

AS

COMPLEMENTARY

ACTIVITY

TO

The planning a c t i v i t y involves a v a r i e t y of d e c i s i o n s that r e q u i r e information not customarily included i n the i n i t i a l s p e c i f i c a t i o n s of à transform. Such d e c i s i o n s i n v o l v i n g reasoning about the search process beyond the information provided by the PSG model c a l l f o r a more elaborate search model and i s considered i n t h i s s e c t i o n . The PSG model may be viewed as a c o l l e c t i o n ready answers to the f o l l o w i n g set of questions: a) Does a node have subgoals? they?

How

many?

What

of are

b) What i s the s t a t u s of a node? Was a s u c c e s s f u l path found? Was i t t r i e d and f a i l e d on a l l paths? Are there descendant subgoals s t i l l open? c) Does the s i t u a t i o n ( i . e . molecule)

represented

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

in

COMPUTER-ASSISTED ORGANIC SYNTHESIS

156

this node occur elsewhere i n the PSG? Is the s i t u a t i o n c i r c u l a r i . e . c a l l i n g for s y n t h e s i z i n g X to synthesize X higher up? d) At an OR-node what i s the best subgoal to take up? At an AND-node what i s the precursor that should be tackled f i r s t ?

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Now consider the f o l l o w i n g set of questions that go beyond the information maintained by the PSG model: a) When there i s an operator whose relevance to synthesizing a molecule is clear, but the a p p l i c a b i l i t y c o n d i t i o n s are not s a t i s f i e d , under what set of circumstances should the unsatisfied preconditions be made into subgoals? Under what c o n d i t i o n s should the operator be r e j e c t e d altogether? b) For a given molecule what i s the most s t r a t e g i c sequence for the i n t r o d u c t i o n of f u n c t i o n a l groups? c) When i s the sequence i n t r o d u c t i o n immaterial?

of

functional

group

d) Given a subgoal, can the paths explored for another s t r u c t u r a l l y homologous s t r u c t u r e be considered v a l i d here? e) Given a synthesis route (or partial route) involving a protection/unprotection reaction p a i r , should an attempt be made to d e r i v e a r e v i s e d route not i n v o l v i n g p r o t e c t i o n by resequencing some of the reactions? Answering these questions c a l l s for maintaining a r i c h e r and b e t t e r organized base of information about the search paths than that provided by the PSG and i t s h e u r i s t i c value assignments to the nodes. As an a l t e r n a t i v e to the Heuristic Search process, consider the f o l l o w i n g two-stage process: a) Perform some e x p l o r a t i o n i n the search space (be i t the State Space of the H e u r i s t i c Search described above, or the Planning Space to be discussed below). b) Gather the search information, analyze, amplify and

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

157

use i t to guide f u r t h e r e x p l o r a t i o n .

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

A great deal of f l e x i b i l i t y and i n v e s t i g a t i v e power comes to us i f we separate the t o t a l system i n t o two components g i v i n g explicit charge of the e x p l o r a t i o n by search to one, and the a n a l y s i s and a s s i m i l a t i o n o f the search information to the other. We gain conceptual c l a r i t y i n t h i n k i n g about the r u l e s for search guidance and set about designing novel Search Models with a new ease and v i g o r . I w i l l describe b r i e f l y the information gathering system which has been developed a t Rutgers and show by example a novel form of search model i n the remaining sections of the paper. ΜΕΤΑ-DESCRIPTION SYSTEM: A PARADIGM FOR INFORMATION-GATHERING The s t r u c t u r e of a system described i n the f a c i l i t y of MDS [ S r i n i v a s a n , 1973 & 1976; Sridharan, 1975] concepts i s radically d i f f e r e n t from the procedure based systems to which we are accustomed. I t i s more f a v o r a b l e , t h e r e f o r e , to introduce the system d i r e c t l y by an example. Table I presents the STRUCTURAL DESCRIPTIONS of the c l a s s e s of e n t i t i e s that are involved i n b u i l d i n g the PSG model. The c l a s s PSGRAPH designates the Problem S o l v i n g Graph whose ELEMENTS are NODES. There are two important c l a s s e s that help structure c o l l e c t i o n s of NODES i n t o OR-NODES and AND-NODES. The l a t t e r three c l a s s e s have MERIT and STATUS r e l a t i o n s associated with them, shown i n the Table r e l a t i n g these nodes to INTEGER values for MERIT, and a c l a s s c a l l e d STATUS f o r the STATUS r e l a t i o n . There are three values of STATUS defined as constants v i z . , OPEN, FAILED and SUCCEEDED. The PSGRAPH has a GOAL which i s a NODE and a c o l l e c t i o n of open nodes and f a i l e d nodes. The TRIALNODE designates the node to take up as subgoal when the search process i s set to explore the space again. The PSGRAPH has a r e l a t i o n STATUS which i s intended to i n d i c a t e the c o n d i t i o n s under which the process should terminate. NODES are r e l a t e d t o the OR-NODEs v i a the SUBGOAL r e l a t i o n , the OR-NODEs i n d i c a t e CHOICES of AND-NODEs and the AND-NODES i n turn INCLUDE any number of NODES

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC

158 (*

TABLE

(* (CONSTANTS (CONSTANTS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(*

I

*)

CONSTANTS

OF T H E DOMAIN

(YESNO (YES NO))) (STATUS (SUCCEEDED

STRUCTURAL

PSG

FAILED

DESCRIPTIONS

*)

OPEN)))

*)

(TON:

[PSGRAPH

(element N O D E elementof) (goal N O D E goalof) (opennodes NODE opennodeof) (failednodes NODE failednodeof) (status STATUS statusof) (trynode NODE|AND-NODEIOR-NODE trynodeof)])

(TDN:

[NODE (subgoal OR-NODE subgoalof) (subnode NODE subnodeof) (descendant NODE descendantof) (status STATUS) ( s i t u a t i o n MOLECULE) (merit INTEGER meritof) (repeated NODE repeatedby) ( c i r c u l a r NODE)])

(TDN:

[OR-NODE (subgoal AND-NODE) (status STATUS) (merit INTEGER)])

(TDN:

[AND-NODE (subgoal NODE) (status STATUS) (merit INTEGER)])

(TDN:

[MOLECULE ( s t r u c t u r e CHEMICAL-GRAPH) ( a v a i l a b l e YESNO)]) (* SENSE DEFINITIONS *)

(QSCC: [((NODE Ν) | (X elem Ν) (N status OPEN)) PSGRAPH opennodes]) (* Flags s p e c i f i a b l e on the r e l a t i o n s have been l e f t out f o r s i m p l i c i t y *)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

SYNTHESIS

7.

Chemical

SRIDHARAN

Synthesis

Planning

by

Computer

Table I. Continued I

(QSCC:

[ ( ( N O D E N) PSGRAPH

(P elem Ν )

(N s t a t u s

FAILED))

(QSCC:

[ ( ( S T A T U S S) I (X ( g o a l s t a t u s ) S ) )

failednodes] ) PSGRAPH

status]) (QSCC:

[((STATUS

S) I

((X

(subgoal s t a t u s ) SUCCEEDED) => (X s t a t u s SUCCEEDED)) ( ( ( N O T [X (subgoal s t a t u s ) F A I L E D ] ) AND

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(NOT

[X S t a t u s S U C C E E D E D ] ) ) (X S t a t u s OPEN))

=>

( ( ( A L L NODE N)(X subgroup Ν) (N s t a t u s

FAILED))

(X S t a t u s F A I L E D ) ) ) OR-NODE

status]) (QSCC:

[((STATUS

S) I

((X

(subgoal s t a t u s ) F A I L E D ) => (X s t a t u s F A I L E D ) ) (((X (subgoal s t a t u s ) OPEN) (NOT [X S t a t u s F A I L E D ] ) ) => (X s t a t u s OPEN)) (((NOT

(NOT

[X S t a t u s

FAILED])

[X S t a t u s O P E N ] ) ) => (X S t a t u s S U C C E E D E D ) ) )

AND-NODE

status]) (QSCC:

[ ( ( N O D E A) I

(A (subgoal

subgoal

subgoal)

X

NODE

subgoalof]) (QSCC:

[ ( ( N O D E A) |

(X s u b g o a l o f A) OR

(X (descendantof s u b g o a l o f ) A ) ) NODE

descendantof]) (QSCC:

[((STATUS

((X

S) I

( s i t u a t i o n a v a i l a b l e ) YES)

=>

(X s t a t u s SUCCEEDED)) ((X (subgoal s t a t u s ) SUCCEEDED) => (X S t a t u s SUCCEEDED)) ((X c i r c u l a r Y) => (X s t a t u s F A I L E D ) ) ) NODE

status]) (QSCC:

[ ( ( N O D E R) I

(X ( s i t u a t i o n

s i t u a t i o n o f ) R))

NODE

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

160 Table I.

ORGANIC SYNTHESIS

Continued

repeated]) C) I (X repeated C) (X descendantof C))

(QSCC:

[((NODE NODE

(QSCC:

[ ( ( I N T E G E R I)

circular]) |

(X (subgoal merit) I ) (NOT [X (subgoal merit >=) I ] ) )

OR-NODE

merit]) (QSCC: [ ( ( I N T E G E R I ) | (X (subgoal merit) I ) (NOT [ I (>= meritof subgoalof) X])) Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

AND-NODE

merit]) (* PRODUCTION RULES GUIDING SEARCH *) [INITIALIZE (IT (PSGRAPH P)) (INPUT (P goal G)) (IR (P goal G)) (IR (P opennodes G))] (G s t a t u s SUCCEEDED) => (OUTPUT G)(HALT) (G Status FAILED) => (OUTPUT G)(HALT) (G element Χ)(X c i r c u l a r Y) => (ASSERT (X Status FAILED)) (NOT [G trynode X]) => (ASSERT (G trynode (G g o a l ) ) ) (G trynode X)(NOT [X subgoal Y]) => (ASSERT (G trynode NIL))(SPROUT X)(EVALUATE X) (G trynode Χ) (X subgoal Y) (X (merit meritof) Y) => (ASSERT (G trynode Y)) (* End of Table I *)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

thus c l e a r l y graph.

Synthesis

exhibiting

Planning

by

Computer

the AND/OR

161

nature

of the

Let us turn our a t t e n t i o n b r i e f l y to the SENSE DEFINITIONS which are s p e c i f i c a t i o n s of the L o g i c a l c o n d i t i o n s that are to be met f o r a s s e r t i n g the various relations and are at the same time s p e c i f i c a t i o n s of the computations to be performed i f the system i s given the r e s p o n s i b i l i t y to f i l l i n values for c e r t a i n r e l a t i o n s . Simple d e f i n i t i o n s are given f o r the OPENNODES and FAILEDNODES r e l a t i o n s of the PSGRAPH c l a s s . The OPENNODES are the set of a l l NODES Ν which are ELEMENTS of X (denoting the PSGRAPH) whose STATUS i s OPEN. The STATUS of the PSGRAPH i s defined to be the status of the goal node of the PSGRAPH w r i t t e n [(STATUS S) I (X (goal status) S ) ] . The nature of the AND-NODES and the OR-NODES i s c l e a r l y s p e l l e d out i n the d e f i n i t i o n s for the STATUS r e l a t i o n s on these c l a s s e s . The d e f i n i t i o n f o r the status of the AND-NODE may be paraphrased i n t o E n g l i s h as f o l l o w s : a) I f X includes a node whose status then the status of X i s also FAILED; b) I f X status i s not FAILED and X OPEN node then the status of X i s OPEN;

i s FAILED

includes

f i n a l l y , c) I f X i s neither OPEN nor FAILED i t must be SUCCEEDED.

an

then

The information displayed i n Table I i s the description the user provides to the Information-gathering system as the s p e c i f i c a t i o n of the Search Model. The system accepts the S t r u c t u r a l D e s c r i p t i o n s and sets up Data Structures and access functions f o r each of the r e l a t i o n s . The Sense D e f i n i t i o n s are analyzed t o compile a Network of Information Flow [Sridharan, 1976] that p r e s c r i b e s the data flow paths when a new piece of information i s made a v a i l a b l e to t h i s system. This much could be termed the "compile-time" a c t i v i t y of the system.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

162

(* TABLE I I *) (* STRUCTURAL DESCRIPTIONS *) (TDN: [SEARCHGRAPH (* This i s the search graph of FLEXI) (elements NODE) (goal NODE) (trynode RNODEISNODEIDNODEIFNODE)])

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(TDN: [SNODE (* State Space Structure (Backward Search)) (node NODE snode) (subgoal OR-NODE) (merit MERIT) (status STATUS) (subgoal NODE) ( c i r c u l a r SNODE) (features FEATURE) (descendant SNODE)]) (TDN: [RNODE (* Planning Space Structure) (node NODE mode) (redgoal RNODE) ( d i f g o a l DNODE) (merit INTEGER meritof) (status STATUS) (relevantfeatures FEATURE)]) (TDN: [DNODE (* State Space Structure (Forward Search)) (node NODE dnode) (tonode NODE) (fromnode NODE) ( d i f f e r e n c e s FEATURE) (merit MERIT) (status STATUS)])

(TDN: [FNODE (* Non-Goal-Directed Forward Search) (node NODE fnode)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRIDHARAN

Table II.

Chemical

Synthesis

Planning

by

Computer

163

Continued

(transform TRANSFORM appliedto) (reactivegroup FEATURE) (canproduce FNODE canbeproducedfrom) (merit MERIT meritof) (status STATUS)]) (TDN: [TRANSFORM

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(* One i s needed f o r every growth) (appliedto FNODE transform) (product FNODE) (reagents MOLECULE)]) (* SAMPLE PRODUCTION RULES *) (ADDED (SNODE S)) => (ASSERT ((S node) fnode NIL)) (* No forward e x p l o r a t i o n i f S was given as a r e t r o s y n t h e t i c goal) (ADD (FNODE D))=> (FILLIN (INSTANTIATE (DNODE (fromnode &(N node)) (tonode &(N (canbeproducedfrom node dnode tonode)))))) (* I f Ν was generated by forward e x p l o r a t i o n t r y a s s e r t i n g a DNODE subgoal i f the information needed i s there) (* SENSE DEFINITIONS *) (QSCC: [((DNODE D) | (D tonode (X goalnode))) RNODE difgoal]) (QSCC: [((DNODE D) | ((X (node status) SUCCEEDED) => (X status SUCCEEDED)) (((X ( d i f g o a l status) SUCCEEDED) (X (redgoal status) SUCCEEDED)) => (X status SUCCEEDED)) (((X ( d i f g o a l status) FAILED)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

164

COMPUTER-ASSISTED ORGANIC SYNTHESIS

OR (X (redgoal status) FAILED)) => (X status FAILED))) RNODE status])

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

(QSCC: [((NODE Y) | (X (difnodeof goalnode) Y)) DNODE tonode]) (QSCC: [((STATUS S) | ((X (fromnode canproduce* node) (X tonode)) => (X Status SUCCEEDED)) ( (X (goalnode goalnode descendant node) (X fromnode)) => (X Status SUCCEEDED))) DNODE status] ) (QSCC: [((STATUS S) | ( (X (node status) SUCCEEDED) => (X status SUCCEEDED)) ((X (subgoal status) SUCCEEDED) => (X Status SUCCEEDED))) SNODE status] ) (QSCC: [((STATUS S) | (((X (mode status) SUCCEEDED) OR (X (snode status) SUCCEEDED) OR (X ( s i t u a t i o n a v a i l a b l e ) YES)) => (X s t a t u s SUCCEEDED) ) ) NODE status])

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

165

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

The planning and problem s o l v i n g i s i n i t i a t e d and c o n t r o l l e d by a s e t of r u l e s that we s h a l l examine p r e s e n t l y . The i n i t i a l i z a t i o n i s s t r a i g h t f o r w a r d and involves c r e a t i n g an instance of PSGRAPH and f i l l i n g i t s goal node. This i n d i c a t e s that i t i s the s p e c i f i c a t i o n of the goal that t r i g g e r s the process. The s p e c i f i c a t i o n of the goal node involves submitting the s t r u c t u r e of the molecule to be synthesized. Turning our a t t e n t i o n away from the Rule s e t that c o n t r o l s the problem s o l v i n g process, l e t us consider the information gathering a c t i v i t y caused by the a d d i t i o n of a subgoal node f o r some operator explored by the search component of the system. This may be s p e c i f i e d as a conjunction of precursor molecules which are INCLUDED i n an AND-NODE. Consider the a c t i o n taken when one of the precursors i s asserted as a SUBGOAL of the GOAL node. The check of the c o n d i t i o n f o r the (NODE subgoalof NODE) r e l a t i o n i n d i c a t e s that t h i s AND-NODE needs to be introduced as a choice i n the OR-NODE pointed to by the GOAL node [ t h i s i s i n d i c a t e d by the expression (A (subgoal choice includes) X ) ] . The consequent a s s e r t i o n of the CHOICE r e l a t i o n flows along i t s data flow path to the STATUS relation of the OR-NODE i n question* It is appropriate to point out here that t h i s data flow l i n k was "compiled" when the d e f i n i t i o n of STATUS was scanned and i t was e s t a b l i s h e d then that any additions/changes to the CHOICE of an OR-NODE was to take e f f e c t i n turn on the STATUS r e l a t i o n . A v e r b a l d e s c r i p t i o n such as t h i s one cannot describe a l l the data flow that takes place but h o p e f u l l y the above explanation conveys the concept of the data flow and consequent " informations-gather ing" proceeding as per the s t r u c t u r a l and sense d e f i n i t i o n s of the Search Model given by the user f o r t h i s problem domain. At t h i s p o i n t , i f the reader w i l l grant that as the r e s u l t s o f the e x p l o r a t i o n conducted by the search component gets fed t o the Information-Gathering component the r e q u i s i t e search model w i l l be created or updated as appropriate, we can turn our a t t e n t i o n back again to the Search Guidance Rules. The Rules are w r i t t e n i n what i s known i n the computer science l i n g o as the "Production Rule Form" [Davis & King, 1975].

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

166

COMPUTER-ASSISTED ORGANIC SYNTHESIS

The production system takes a sequence c o n d i t i o n a l a c t i o n s p e c i f i c a t i o n s of the form:

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

RULE :

of

(Condition to check)=>(Action to take)

In applying one of these r u l e s the l e f t - h a n d side of the Rule i s f i r s t tested against the current s t a t e of the model and i f the t e s t i s s a t i s f i e d the a c t i o n s of the r i g h t side of the r u l e are performed. There are a v a r i e t y of r u l e sequencing methods conceivable, but we s h a l l use only the simplest of them here. The c o n t r o l s t a r t s from the beginning of the r u l e sequence and t r i e s each r u l e and c y c l e s back to the f i r s t r u l e a f t e r the l a s t r u l e i s t r i e d . The execution of a (HALT) i n some a c t i o n component of a r u l e terminates the e n t i r e process. The r u l e set for the PSG model i s very simple. I n i t i a l l y , the status of the PSGRAPH i s examined to see i f the process should terminate. The r u l e s w r i t t e n here s p e c i f y that i f the status of the PSGRAPH i s FAILED or SUCCEEDED then the graph i s output and the process h a l t s . Otherwise, I f there i s a node marked as a p o s s i b l e node to sprout, the goal node i s picked f i r s t . On the other hand, the presence of a trynode which has no subgoals i n d i c a t e s that the s e l e c t i o n i s completed and the a c t i o n to take i s to SPROUT the trynode and EVALUATE it. The responsibility of SPROUT is to choose a transformation, t e s t i t s a p p l i c a t b i l i t y and give the set of precursors to Modelling System. The Modelling System then updates the model and posts the tree h i e r a r c h y , the c i r c u l a r i t y and s t a t u s r e l a t i o n s . The evaluation a l s o submits i t s merit r a t i n g of the nodes to the Modelling System which i n t u r n , reassigns the merits of the a f f e c t e d AND-NODES and OR-NODES. The i m p l i c a t i o n s of the new information so entered are followed by the Modelling System and the information needed by the r u l e set i s provided i n a ready form. The control repeatedly flows in a SELECT-SPROUT-EVALUATE loop u n t i l h a l t e d . The r u l e set i s f l e x i b l e and can be changed i f a s p e c i f i c synthesis problem suggests a d i f f e r e n t form of c o n t r o l . I t i s not d i f f i c u l t to enter s y n t a c t i c guidance based on the model graph using the distance of a node from the g o a l , the number of conjuncts at an

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical Synthesis Phnning by Computer

167

AND-NODE e t c . . I t i s also p o s s i b l e to guide the search based on chemical information, f o r example, to disregard subgoals involving a seven-membered heteratomic r i n g . I t i s conceivable that when the system runs i n t e r a c t i v e l y the guidance w i l l be changed and experimented with as the search proceeds.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

SEARCH IN A PLANNING SPACE The s t r u c t u r e of the search space i s determined not only by the c o l l e c t i o n of transforms a v a i l a b l e to the system but also by the r u l e s f o r s e l e c t i n g the transforms to be t r i e d f o r any subgoal. There are two basic ways f o r s e l e c t i n g transforms. a) S e l e c t i o n by A p p l i c a b i l i t y . I f the transforms are selected because they guarantee that the target molecular s t r u c t u r e w i l l be produced upon their a p p l i c a t i o n , the required precursors become subgoals. This i s the method of s e l e c t i n g transforms by t h e i r a p p l i c a b i l i t y i n the r e t r o s y n t h e t i c d i r e c t i o n . The synthesis sequence grows a step at a time ensuring that the target molecular s t r u c t u r e w i l l be a product of the r e a c t i o n sequence developed so f a r , once the precursors are made available. The space of p o s s i b i l i t i e s determined by this criterion of a p p l i c a b i l i t y i s termed the STATE SPACE. b

) S e l e c t i o n by Relevance. In some cases a transform i s selected because i t produces a molecule only s i m i l a r to the target molecular s t r u c t u r e but not e x a c t l y the same. When such a transform i s applied to a target s t r u c t u r e T, i t may synthesize some s t r u c t u r e S that i s s i m i l a r to T, from a set of precursors P. This breaks the o r i g i n a l problem of synthesis i n t o two subproblems, i) Synthesize P. S(P) i i ) Transform S==>T. TR(S,T) The transform selected s p e c i f i e s a r e a c t i o n that converts Ρ to S, and t h i s transform w i l l , i n general, c o n s t i t u t e an intermediate step i n the synthesis sequence. The space of p o s s i b i l i t i e s determined by the c r i t e r i o n of relevance i s termed the PLANNING SPACE and i n several s i t u a t i o n s the search toward a

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

168

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Scheme I.

Reaction used in

transform

a.

TRANSFORM

SYNTHESIZE

b.

TRANSFORM

SYNTHESIZE Scheme II.

Pfonning

space

maneuvers

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

169

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

s o l u t i o n can be b r i e f e r i n t h i s space than i n the State Space. This d e f i n i t i o n of a Planning Space i s a v a r i a t i o n of the concept introduced i n the General Problem Solver system (GPS) [Ernst, 1969]. The d i s t i n c t i o n between the S e l e c t i o n of transforms by A p p l i c a b i l i t y and by Relevance i s an important one when considering the s t r a t e g i e s one might employ to search f o r synthesis sequences. The search i n a Planning Space has the c h a r a c t e r i s t i c that the search "leaps" i n t o some intermediate point i n the synthesis sequence and e s t a b l i s h e s an " i s l a n d " and the s o l u t i o n search could then proceed from the i s l a n d to the target molecule i n the forward d i r e c t i o n or from the i s l a n d backward i n the r e t r o s y n t h e t i c d i r e c t i o n toward a v a i l a b l e molecules. The s i g n i f i c a n c e of t h i s a b i l i t y to leap has been explored i n other task areas than synthesis search and has been found to be a powerful t o o l i n converging on s o l u t i o n s r a p i d l y . I t s u t i l i t y f o r synthesis search remains to be shown and for now can be i l l u s t r a t e d only i n terms of examples. The f o l l o w i n g sketch of an example i s o f f e r e d to i l l u s t r a t e the idea of planning space. The example has not been checked by any chemist and thus i t s chemical correctness cannot be assured. Wipke [Wipke, 1976] uses the example of a r e a c t i o n that synthesizes an a l c o h o l group i n 1,4 r e l a t i o n to an e l e c t r o n withdrawing group, say C=0, by the opening of epoxide by a s t a b i l i z e d i o n . Consider the target s t r u c t u r e shown i h Scheme I . The c r i t e r i o n of a p p l i c a b i l i t y would require that the target s t r u c t u r e contain the -C(OH)-C-COsubstructure and would not be a p p l i c a b l e i n the State Space search. In search conducted i n the Planning Space, i f the transform i n d i c a t e d i s considered relevant to the target s t r u c t u r e (e.g., i f the presence of the a l c o h o l and an unsubstituted 1,4 carbon i s s u f f i c i e n t ) then t h i s transform may be used. The o r i g i n a l synthesis problem i s replaced with two subproblems shown i n Scheme I I .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

170

The above example could be s u c c e s s f u l l y completed by working forward reducing the d i f f e r e n c e between S and T, and working backward i n s y n t h e s i z i n g P. COMBINING SEARCH IN PLANNING AND STATE SPACE The search i n a planning space should be conducted by t a k i n g repeatedly the synthesize S type problem for each Ρ s p l i t t i n g i t each time i n t o a 'Synthesize and a 'Transform' type problem, d e f e r r i n g a l l the Transform TR type problems t i l l some a v a i l a b l e molecular s t r u c t u r e i s reached along a path. This w i l l generate a S k e l e t a l Plan where many of the intermediate steps are l a c k i n g i n d e t a i l but each one i s given as a TR type problem. I f an e v a l u a t i o n function can designed for these s k e l e t a l plans much useless search can be avoided by using the planning space.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

1

The search i n the s t a t e space i s conservative and takes small steps attempting to make steady progress towards completing a set of s o l u t i o n s . This can cause e i t h e r aimless wandering because small changes i n the merit values assigned to subgoals cause no s i g n i f i c a n t s h i f t s of a t t e n t i o n or because the changes i n the merit values cause l a r g e abrupt changes i n behaviour. This has been given the graphic name of the Mesa Phenomenon by Minsky [Minsky, 1963]. The planning space s t r u c t u r e s the solution sequendes q u i t e d i f f e r e n t l y and causes a more g o a l - d i r e c t e d search to proceed. The method of transform s e l e c t i o n allows the program to "leap" i n t o the s o l u t i o n sequence and decide upon one of the intermediate r e a c t i o n s and permits the s o l u t i o n to grow i n both d i r e c t i o n s . It appears that a j u d i c i o u s combination of both the State Space and Planning Space search methods might be able to overcome some of the d i f f i c u l t i e s found i n each of the methods [Amarel, 1969] With these c o n s i d e r a t i o n s i n view the next s e c t i o n introduces a framework i n which to combine the two spaces, l e t t i n g the chemist user supply the search guidance r u l e s customized to the p a r t i c u l a r problem at hand. FLEXI:

A FLEXIBLE ADVANCED SEARCH MODEL

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

7.

SRiDHARAN

Chemical

Synthesis

Planning

by

Computer

171

Given a c o l l e c t i o n of nodes designating molecules there are the f o l l o w i n g types of tasks one can generate (Table I I ) : a) For a given node whose molecule i s not yet synthesized, develop a synthesis by working backwards in the s t a t e space using a p p l i c a b l e transforms. b) For a given node whose molecule i s not yet synthesized, develop a synthesis by problem reduction using relevant transforms. c) For a given ordered p a i r of nodes develop a synthesis route that transforms one molecule to the other. d) For a given node execute a b r i e f non-goal-directed exploration forwards using reactions i n their conventional d i r e c t i o n s . A search model i s introduced here, c a l l e d FLEXI, in which four types of s t r u c t u r e s are used to symbolize the above four categories of tasks and these four nodes form the BUILDING BLOCKS of the search management model. Figure 2 shows that a NODE designates a molecule and i n d i v i d u a l l y can be set up as an SNODE for searching r e t r o s y n t h e t i c sequence, or as an RNODE f o r search for a planning route. The FNODE i s used t o s e t up the node f o r forward e x p l o r a t i o n without a goal guidance. The sprouting of an SNODE generates a piece of the f a m i l i a r AND/OR problem s o l v i n g graph and the status r e l a t i o n s on SNODE, OR-NODE and AND-NODE are posted s i m i l a r to that given e a r l i e r . The sprouting of an RNODE R l , see Figure 3, c o n s t i t u t e s a step i n the planning space and generates two tasks by problem reduction - an RNODE R2 and a DNODE D l . R2 c a l l s for the synthesis of a molecule by further problem reduction and the DNODE sets up a problem of transforming one molecule i n t o another. The transform used i n the sprouting of R l i s used to e s t a b l i s h that i t canproduce the fromnode N3 of the DNODE from the node N2 of R2. This i s e x h i b i t e d i n Figure 3. The new task set up i n the RNODE could of course be r e s t r u c t u r e d as an SNODE task by some r u l e i n the production system. The RNODE w i l l be considered successful as soon as the NODE connected t o i t has

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

172

COMPUTER-ASSISTED ORGANIC

SYNTHESIS

FNODE

Ο

fnode situation

NODE

MOLECULE

RNODE

SNODE

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Figure

2.

NODE

building

block

RNODE Rl node

redgoah

\difgoal

(RNODE R2 (

)DNODE D1

-K

) NODE N1 fnode

\snode

Ç)FNODE F3

(

)SNODE S2

SNODE SI

Figure

3.

The RNODE

building,

block

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical Synthesis Planning by Computer

173

SUCCEEDED, whether by i t s RNODE or the SNODE tasks or even by i t s designating a molecule which i s a v a i l a b l e i n the Catalog of S t a r t i n g Compounds.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

Under c e r t a i n circumstances a synthesis f o r the fromnode N3 of the DNODE could be attempted independently and i t s success w i l l d i s a l l o w the RNODE R2 from further c o n s i d e r a t i o n during search. In that case, the success of the DNODE i s s u f f i c i e n t to guarantee the success of RNODE R l and thereby of NODE Nl. The DNODE can succeed e i t h e r when a path i s found by working forwards from FNODE F2 t o F3 or by working backwards from SNODE S2 t o SI. Figure 4 i l l u s t r a t e s the canproduce relation among p a i r s of FNODES. Working forwards from F l i f a molecule M2 r e s u l t s from a transformation i n v o l v i n g Ml then the corresponding FNODE F l can produce F2. The s t r u c t u r e s e x h i b i t e d here are only the b u i l d i n g blocks of the search model. By s u i t a b l y c o n t r o l l i n g and guiding the e x p l o r a t i o n , the search can take on great v a r i e t y t r a v e r s i n g the s t a t e space forwards or backwards or t r a v e r s i n g the planning space. The r u l e s e t can be made to contain broad i n j u n c t i o n s such as "Do not explore a node i n the forward d i r e c t i o n i f i t was created by i n s t a n t i a t i n g a SNODE, i . e . a node to proceed i n the r e t r o s y n t h e t i c d i r e c t i o n " ' or "When you add an FNODE immediately i n s t a n t i a t e a revised DNODE type task" as shown i n Figure 5. Of course, these two r u l e s are used here only as examples and i n given s i t u a t i o n s one might wish to advise the system otherwise. The e s s e n t i a l point i s that a f l e x i b l e form of search guidance s p e c i f i c a t i o n i s a v a i l a b l e and can be used t o bring t o bear on a given problem a wide v a r i e t y of h i n t s , suggestions and advice that would be d i f f i c u l t i n a standard H e u r i s t i c Search program. Overcoming some d i f f i c u l t i e s of H e u r i s t i c Search. One cause of the Mesa Phenomenon i n the case of chemical synthesis i s the use of f u n c t i o n a l group substitution reactions while working i n the

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

Ο

node

ORGANIC SYNTHESIS

FNODE F2 product

\

canproduce transform

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

FNODE Fl Figure

4.

a p p l i e d

FNODE

t 0

building

T

R

A

N

S

F

0

R

M

block

( A D D E D ( S N O D E S)) => ( A S S E R T ((S node) fnode N I L ) ) (* No forward exploration if S was given as a retrosynthetic goal) (ADDED (FNODE D)) => (FILLIN (INSTANTIATE ( D N O D E (fromnode &(N node)) (tonode & ( N (canbeproducedfrom node dnode tonode ) ) ) ) ) ) (* If Ν was generated by forward exploration try asserting a D N O D E subgoal if the information needed is there)

Figure

5.

Sample guidance

rules

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

7.

SRiDHARAN

Chemical

Synthesis

Vlanning

by

Computer

175

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

retrosynthetic direction. I f a molecule containing several f u n c t i o n a l groups i s selected f o r sprouting then performing f u n c t i o n a l group s u b s t i t u t i o n s often yields precursors that have nearly the same merit as the target molecule. The presence of several f u n c t i o n a l groups only aggravates the s i t u a t i o n . Within FLEXI t h i s c l a s s of r e a c t i o n s could be used mainly f o r transforming a molecule to another when they are s t r u c t u r a l l y s i m i l a r , i . e . a DNODE type task which might be c a r r i e d out most favorably using f u n c t i o n a l group s u b s t i t u t i o n s . Thus, avoiding the use of these reactions i n the r e t r o s y n t h e t i c d i r e c t i o n should prevent the problem. As further s p e c i f i c problems are i s o l a t e d and solved, the framework of FLEXI may help us to submit the proper r u l e s of search guidance to the system. CONCLUSION The H e u r i s t i c Search method i s b a s i c a l l y very simple. I t involves a s e r i a l processor working backwards that s e l e c t s subgoals and transforms by asking "What next?". A good implementation of the h e u r i s t i c method includes a SEARCH MODEL which i s a symbolic representation of the progress of search. The Problem Solving Graph that i s commonly used to guide search i s a u s e f u l but l i m i t e d search model. Symbolization i s the key to Reasoning and the computer can reason only about things i t can handle s y m b o l i c a l l y . Furthermore, the richness of the search model c o n t r i b u t e s to the conduct of an i n t e l l i g e n t search. In t h i s paper a search modelling system i s described by examples. This system allows the user to describe rather than to program the search model and to associate c o n s t r a i n t s that govern the growth of the model. The system provides a Rule Language based on the user described search model and the user may then p r e s c r i b e r u l e s i n the form of Production Rules. The r u l e s the user w r i t e s can be general, being v a l i d over a wide v a r i e t y of task s p e c i f i c a t i o n s , or can be b i t s of advice and h i n t s about a given problem. The paper concludes by showing how some of the standard d i f f i c u l t i e s with h e u r i s t i c search such as g e t t i n g locked into a plateau can be overcome by s u i t a b l e

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

176

techniques o f s e a r c h m o d e l l i n g . The m o d e l l i n g i d e a s shown h e r e c a n be used t o c o n t r o l and specify p r o t e c t i o n r e a c t i o n s and f o r sequencing f u n c t i o n a l group i n t r o d u c t i o n . I t i s also possible to carry out different s t y l e s o f e x p l o r a t i o n v a r y i n g t h e emphasis on t h e d i r e c t i o n a l i t y and space i n w h i c h t h e s e a r c h i s conducted. The applicability o f t h e model t o complement a h e u r i s t i c search process i s now made c l e a r , however i t s a c t u a l use i n c h e m i c a l s y n t h e s i s planning awaits the p a r t i c i p a t i o n of a chemist collaborator !

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

ACKNOWLEDGMENT I wish t o thank Prof. Srinivasan for h i s stimulating discussions of h i s modelling ideas r e p r e s e n t e d i n MDS w i t h o u t which my u n d e r s t a n d i n g o f MDS would be meager. A s u b s e t o f MDS adequate t o d e a l w i t h PSG and FLEXI i s b e i n g programmed and made a v a i l a b l e b u t so f a r no s y n t h e s i s problems have been t r i e d o u t i n t h e s e frameworks. The p r e s e n t system i s implemented i n t h e language FUZZY [ L e F a i v r e , 1974]. I look forward t o the continued cooperation of Prof. L e F a i v r e i n t h e f u t u r e and h i s h e l p i n t h e p a s t i s hereby g r a t e f u l l y acknowledged. I w i s h t o thank P r o f . S a u l Amarel f o r h i s e x p e r t a d v i c e upon r e a d i n g t h i s paper.

REFERENCES 1.

C h a r l e s Schmidt [1976] U n d e r s t a n d i n g Human Action: R e c o g n i z i n g t h e P l a n s and M o t i v e s o f Other P e r s o n s , i n Cognition and Social B e h a v i o r , J. Carroll & J. Payne (editors), Lawrence Earlbaum P r e s s , 1976.

2.

N.S. S r i d h a r a n [1975] The Architecture o f BELIEVER: A System for Intepreting Human Actions. T e c h n i c a l Report RUCBM-TR-46, Department o f Computer S c i e n c e , University, New Brunswick N J .

3.

Rutgers

N.S. S r i d h a r a n [1976] The Architecture of BELIEVER-Part II. The Frame and Focus Problems in AI. T e c h n i c a l Report RUCBM-TR-47, Department o f Computer S c i e n c e , R u t g e r s University, New Brunswick N J .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

7. SRIDHARAN

Chemical Synthesis Planning by Computer

177

4.

Chitoor Srinivasan [1973] The Architecture o f a Coherent I n f o r m a t i o n System. Advanced P a p e r s o f t h e T h i r d International Joint C o n f e r e n c e on Artificial Intelligence, S t a n f o r d , 1973.

5.

Chitoor Srinivasan [1976] An I n t r o d u c t i o n t o t h e M e t a - D e s c r i p t i o n System. T e c h n i c a l Report SOSAP-TR-18, Department o f Computer Science, Rutgers University, New Brunswick NJ.

6.

N.S. S r i d h a r a n [1974] A Heuristic Program t o D i s c o v e r S y n t h e s e s f o r Complex O r g a n i c M o l e c u l e s . P r o c e e d i n g s o f the IFIP74 (International Federation for Information Processing) C o n g r e s s , S t o c k h o l m , August 1974.

7.

N.S. S r i d h a r a n [1971] An Application of Artificial Intelligence t o the D i s c o v e r y o f Complex O r g a n i c S y n t h e s i s . D o c t o r a l Dissertation, Department o f Computer S c i e n c e , SUNY a t Stony B r o o k , 1971.

8.

N.S. S r i d h a r a n [1973] Search Strategies f o r the t a s k o f O r g a n i c C h e m i c a l S y n t h e s i s . Advanced Papers o f the T h i r d International Joint Conference on Artificial Intelligence, Stanford, 1973.

9.

H. G e l e r n t e r [1973] The D i s c o v e r y o f O r g a n i c S y n t h e t i c Routes by Computer. T o p i c s i n C u r r e n t C h e m i s t r y , Volume 41, S p r i n g e r - V e r l a g , 1973.

10.

H e r b e r t G e l e r n t e r [1976] Empirical E x p l o r a t i o n s o f SYNCHEM, An Application o f Artificial Intelligence t o the Problem o f Computer-Directed Organic S y n t h e s i s D i s c o v e r y . This volume. 11.

E . J . Corey & W.T. Wipke [1969] C o m p u t e r - a s s i s t e d D e s i g n o f Complex O r g a n i c S c i e n c e , Volume 166 (178), 1969.

Synthesis,

12.

Todd Wipke [1973] C o m p u t e r - A s s i s t e d Three D i m e n s i o n a l S y n t h e t i c Analysis. I n Computer R e p r e s e n t a t i o n and M a n i p u l a t i o n o f C h e m i c a l I n f o r m a t i o n . W.T. Wipke et. al. (editors), John W i l e y (New York) 1974.

13.

Todd Wipke [1976]

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

178

COMPUTER-ASSISTED ORGANIC SYNTHESIS S E C S — S i m u l a t i o n and evaluation o f C h e m i c a l S y n t h e s i s : S t r a t e g y and P l a n n i n g . T h i s volume.

14.

I v a r U g i [1976] The S y n t h e t i c D e s i g n Program MATSYN as a p a r t o f MATCHEM, A System o f Computer Programs f o r t h e D e d u c t i v e Solution o f C h e m i c a l P r o b l e m s . T h i s volume.

15.

Nils Nilsson [1971] Problem S o l v i n g Methods in McGraw Hill (New York) 1971.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch007

16.

Artificial

A l l e n N e w e l l & H e r b e r t Simon [1972] Human Problem Solving. Prentice-Hall 1972.

Intelligence.

(New J e r s e y )

17.

H e r b e r t G e l e r n t e r [1962] Machine-Generated Problem S o l v i n g Graphs. P r o c e e d i n g s o f a Symposium on t h e M a t h e m a t i c a l Theory o f Automata Polytechnic Institute o f B r o o k l y n P r e s s (New York) 1962. 18. James Slagle [1971] Artificial Intelligence: The Heuristic Programming Approach. M c G r a w - H i l l , New Y o r k , 1971. 19.

R a n d a l l D a v i s and J o n a t h a n K i n g [1975] An Overview o f P r o d u c t i o n Systems, S t a n f o r d Artificial Intelligence L a b o r a t o r y Memo AIM-271, October 1971.

20.

George E r n s t & Allen N e w e l l [1969] GPS: A Case Study in Generality and Problem S o l v i n g . Academic P r e s s (New York), 1969.

21.

M a r v i n M i n s k y [1963] S t e p s Toward Artificial Intelligence. I n Computers and Thought, by Edward Feigenbaum and Julian Feldman (editors), M c G r a w - H i l l (New York) 1963.

22.

S a u l Amarel [1969] P r o b l e m - S o l v i n g and D e c i s i o n - M a k i n g by Computer: An Overview. In Cognition: A Multiple V i e w , G a r v i n (editor), S p a r t a n Books (Washington) 1970.

23.

R i c h a r d L e F a i v r e [1974] F u z z y P r o b l e m - S o l v i n g , Ph.D. Dissertation, Computer S c i e n c e Department, University o f W i s c o n s i n , Madison, 1974. The R e p r e s e n t a t i o n o f F u z z y Knowledge, J o u r n a l o f C y b e r n e t i c s , volume 4, p57-66, 1974.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8 Computer-Assisted Synthetic Analysis in Drug Research P. GUND, J. D. ANDOSE, and J. B. RHODES

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008

Merck Sharp & Dohme Research Laboratories, Dept. of Scientific Information, and Corporate Management Information Systems, MSDRL Systems and Programming Dept., Merck & Co., Inc., Rahway, N.J. 07065

It was recognized at Merck some time ago (1) that the analytical and data-handling capabilities of a computer could f a c i l i t a t e organic synthesis design, just as spectroscopic methods have revolutionized structure determination. Indeed, the chemist's approach to synthetic design resembles a computer program (2). As outlined in Figure 1, the chemist begins by defining his synthetic problem; he collects relevant knowledge about chemical reactions (note that he never sequentially searches through all available literature); and he designs a synthesis. If the synthesis f a i l s or if he wants additional p o s s i b i l i t i e s , he iterates (repeats the process) to generate additional syntheses. If initial attempts are unsatisfactory, the chemist may enlarge his store of relevant knowledge, e.g. by reading the literature or talking to an expert; or he may redefine the problem - i . e . , find new keys so that more of his knowledge of reactions becomes relevant. In fact, chemists have been known to search exhaustively for a way to implement a preferred route, only to f i n a l l y re-analyze their problem and find an entirely different - and ultimately successful - one. If sufficiently desperate, the chemist may browse (i.e., perform a random search) through the literature for ideas. This method has occasionally succeeded when more rational approaches failed, and might be taken as an indication that our reaction classification and retrieval methods are imperfect. We may envision two levels of computer support of the chemist's analytical process. One approach - which we may c a l l a reaction retriever - organizes and retrieves relevant reaction information. The other, which we here c a l l a synthesizer, simulates a large part of the synthetic process, as shown by the frame in Figure 1. Both computer approaches require a data base of chemical reactions, and it is conceivable that they could share the same data base. We w i l l return to this point; but first, we should consider whether either method would find use by the practicing chemist. 179 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

LEME PDROEFBIM PAKOTETYRERNS CONCEPTS ν

>

>

Figure

1.

Organic

e n

synthesis: Problem

^LRTIEEARCATTOIUNRE f

"retriever

synthesizer"

CHEMSIT)

analysis

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008

8.

Synthetic

GUND ET AL.

Analysis

in

Drug

Research

181

Categories of Pharmaceutical Synthetic A n a l y s i s We i d e n t i f y four types of pharmaceutical s y n t h e s i s : (1) s y n t h e s i s of analogs of a l e a d " compound, (2) s y n t h e s i s of n a t u r a l products, (3) process development» and (4) new r e a c t i o n discovery (Figure 2). A lead analog program normally attempts 11

Synthesis

Type

Synthesis

Analogs of "Lead"

Class

Retriever

SM ?

Synthesizer f o r D i f f i c u l t Compounds; R e t r i e v e r Retriever

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008

SM Ρ Ρ

Synthesizer; Retriever

Retriever

Process Development

? ·* Ρ SM + Ρ

Synthesizer; Retriever

Retriever

New

SM ·> ?

R e t r i e v e r ; Forward Operating Synthesizer

Natural

? SM

Computer A i d

Product

Reactions

Figure

2.

Computer-assisted

synthetic

analysis types and

classes

to f i n d short syntheses of a s e r i e s of r e l a t e d compounds, o f t e n from a s i n g l e intermediate which can be obtained i n l a r g e q u a n t i t i e s - f o r example, c o n s t r u c t i o n of various s i d e chains s t a r t i n g from p e n i c i l l a n i c a c i d . O c c a s i o n a l l y , however, a d e s i r e d analog must be made by q u i t e d i f f e r e n t chemistry - f o r example, p r e p a r a t i o n of dethiacephalosporins (J3). Also o c c a s i o n a l l y , analogs w i l l be made by applying s t r a i g h t f o r w a r d chemistry to an a v a i l a b l e s t a r t i n g m a t e r i a l . Thus, i f we d i f f e r e n t i a t e three c l a s s e s of s y n t h e t i c analyses - (a) s t a r t i n g m a t e r i a l and product s p e c i f i e d (SM •> P); (b) product s p e c i f i e d (? •+ P) ; and (c) s t a r t i n g m a t e r i a l s p e c i f i e d (SM ?) , then lead analog syntheses may belong to any of the c l a s s e s . When a n a t u r a l product with i n t e r e s t i n g b i o l o g i c a l a c t i v i t y i s i s o l a t e d , s y n t h e s i s i s used to confirm the s t r u c t u r e and to o b t a i n s u f f i c i e n t m a t e r i a l f o r biochemical and s t r u c t u r e m o d i f i c a t i o n s t u d i e s . Synthetic a n a l y s i s i s g e n e r a l l y of the "product s p e c i f i e d (? P) type, except when a r e l a t e d m a t e r i a l i s known and a v a i l a b l e (SM Ρ synthesis) · For process development, a n a l y s i s tends to be exhaustive, s i n c e the optimal commercial s y n t h e s i s o f t e n i s d i f f e r e n t from the "best" laboratory method. When a cheap r e l a t e d compound i s a v a i l a b l e , the a n a l y s i s may be of the SM ·* Ρ type. F i n a l l y , chemists apply new r e a c t i o n s to known compounds i n order to generate new drug leads; t h i s r e q u i r e s s y n t h e t i c f f

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

182

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008

analyses of the SM ? o r , o c c a s i o n a l l y , SM Ρ types. A r e a c t i o n r e t r i e v e r program i s a p p l i c a b l e to a l l f o u r types of syntheses. A s y n t h e s i z e r program a p p l i e s p r i m a r i l y to the ? Ρ c l a s s o f problem, although a forward working s y n t h e s i z e r would apply to SM ? problems. The a p p l i c a b i l i t y of these programs by s y n t h e s i s type i s summarized i n F i g u r e 2. As a g e n e r a l i z a t i o n , a s y n t h e s i z e r i s most u s e f u l f o r " c r e a t i v e syntheses, while a r e a c t i o n r e t r i e v e r may i d e n t i f y optimal c o n d i t i o n s a f t e r a r e a c t i o n pathway has been chosen. Therefore both methods should be v a l u a b l e .

11

Program D e s c r i p t i o n s A r e a c t i o n r e t r i e v e r program permits o r g a n i z i n g and e n l a r g i n g the s t o r e of r e a c t i o n knowledge a v a i l a b l e to each individual. Chemists have long used manual systems f o r o r g a n i z i n g r e a c t i o n knowledge, such as i n d i v i d u a l card f i l e s , T h e i l h e i m e r s famous s e r i e s of volumes, Reactiones Organicae, and r e c e n t l y the Derwent Chemical Reactions Documentation S e r v i c e . Computer o r g a n i z a t i o n of such c o l l e c t i o n s enables r e t r i e v a l by r e a c t a n t s , products, r e a c t i o n type, r e a c t i o n c o n d i t i o n s , and/or mechanism (4)· C r e a t i o n of such a computer program i s g e n e r a l l y considered an information r e t r i e v a l a p p l i c a t i o n (4)· A s y n t h e s i z e r program t r a d i t i o n a l l y begins with a t a r g e t s t r u c t u r e and a p p l i e s chemical r u l e s to generate and evaluate p o t e n t i a l precursors. I t o f f e r s the c a p a b i l i t y of f a s t , exhaustive, unbiased s y n t h e t i c analyses. As i n d i c a t e d i n many of the other c o n t r i b u t i o n s to t h i s symposium, t h i s approach i s u s u a l l y considered an a r t i f i c i a l i n t e l l i g e n c e a p p l i c a t i o n . Since we had concluded that both program types were d e s i r a b l e , we wondered i f the same r e a c t i o n data base could serve for both computer approaches, as the flow diagram of Figure 1 suggests. We t h e r e f o r e embarked upon a f e a s i b i l i t y study to t e s t t h i s dual use concept. 1

Proposed Computerized Chemical Reaction C o l l e c t i o n We conceived a system where r e a c t i o n s coded by Merck chemists could serve three purposes - current awareness, r e a c t i o n r e t r i e v a l , and s y n t h e s i z e r input (Figure 3). We i d e n t i f i e d the f o l l o w i n g system development stages: (I) development of r e a c t i o n coding sheet; (II) d e s c r i p t i o n o f r e a c t i o n s on sheets by chemists ( c o n t i n u i n g ) ; ( I I I ) t r a n s l a t i o n to computer readable r e a c t i o n information by i n f o r m a t i o n s c i e n t i s t s and t y p i s t s ( c o n t i n u i n g ) ; (IV) development of software f o r computer storage and r e t r i e v a l of r e a c t i o n i n f o r m a t i o n . In the f e a s i b i l i t y study, we a c t u a l l y c a r r i e d out phases I and I I , and performed a l i m i t e d systems a n a l y s i s o f phases I I I and IV. We designed a r e a c t i o n coding sheet t o c o n t a i n most of the information needed by both r e a c t i o n r e t r i e v e r and s y n t h e s i z e r programs (Figure 4 ) . Data on r e a c t a n t s , products, intermediates, reagents, c o n d i t i o n s , y i e l d s , work-up procedures, mechanism, and

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

8.

GUND E T A L .

Synthetic

Analysis

in Drug

CHEMISTS WRITE REACTION DESCRIPTIONS

DEVELOP REACTION CODING SHEET

»

CODERS INPUT REACTIONS

' REACTION I ' RETRIEVAL/ SERVICE/

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch008

Figure

3.

Reaction

1.CODER: P. G-u* A

file

6 . REFERENCES:

Rfc+cl.^e

T^WKcdrov 7. AFFILIATION:

I INPUT FOR / SYNTHESIZER/ / PROGRAM /

development

2 .LOCATION(ΜΟΝΕ) : RIO (Wli)

4.REACTION NAME: C c p K e J # e p * r i i * S y n ^ e c i S

}

CAr^nC**,

*

3.DATE:

Zlz7/rtf

5.SOURCE: ^*MERCg>

V * * ^ V 6 * 9 , * · Γ3 Λ * 7 . ϊ )

LeH 3_) , a s s u m e d t o be c a r r i e d o u t i n b a s i c conditions with the r e l a t i v e l y b u l k y t.-butoxide ion* For illustrative purposes, assume t h a t the s k e l e t o n J_0 i s known, and o n l y the placement of a c h l o r i n e atom a t one o f the methylene groups i s in question (candidate structures t h e n d i f f e r by t h i s p l a c e m e n t ) .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

198

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

/ED ITREACT NAME: DEHYDROCHLORI NAT I ON (NEW REACTION) "SITE >CHAIN 3 >ATNAME I C L >HRANGE 3 I 3 >SH0W NAME * DEHYDROCHLOR INATION ATOM TYPE ARTYPE. NEIGHBORS HRANGE 1 CL NON-AR 2 2 C NON-AR I 3 3 C NON-AR 2 1-3 >ADRAW DEHYDROCHLOR INAT ION:

(HRANGES NOT INDICATED)

CL-C-C >DONE

"CONSTRAINTS >BADLIST BADLIST CONSTRAINTS CONSTRAINT NAME:CCTBU CONSTRAINT NAME: >DONE_

"SHOW NAME:

DEHYDROCHLOR I NAT I ON

SITE: ATOM TYPE ARTYPE NEIGHBORS HRANGE 1 CL NON-AR 2 2 C NON-AR I3 3 C NON-AR 2 1-3 DEHYDROCHLOR INAT ION: (HRANGES NOT INDICATED) NON-C ATOMS: I- CL 1-2-3

'TRANSFORM >NDRAW DEHYDROCHLOR INAT ION: NON-C ATOMS: l->CL

(HRANGES NOT

INDICATED)

TRANSFORM: UNJOIN I 2 JOIN 2 3 DELATS I CONSTRAINTS:

1-2-3 BADLIST NAME CCTBU

>UNJOIN I 2 >JOIN 2 3 >ADRAW DEHYDROCHLOR INAT ION: CL C=C

(HRANGES NOT INDICATED)

CONSTRAINTS

"•DONE (DEHYDROCHLOR INAT ION

DEFINED)

(DEHYDROCHLOR INAT ION ADDED TO THE REACTION

LIST)

>DELATS I >DONE

Figure

3. An interactive session with CONGEN including definition of the reaction site the reaction transform (TRANSFORM) and constraints on the reaction site (CONSTRAINTS) for the example reaction, dehydrochlorination. A summary of the complete reaction is provided by the SHOW command. User responses to CONGEN are underlined (carriage-returns terminate each command).

(SITE),

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

9.

VARKONY ET

AL.

Computer-Assisted

Structure

Elucidation

199

Reaction S i t e • The r e a c t i o n s i t e represents the segment o f a m o l e c u l e w h i c h w i l l be transformed* The segment i n c l u d e s the atoms a c t u a l l y involved i n the reaction transformation together with any other structural features necessary for the reaction to occur* The site is defined using the appropriate EDITSTRUC commands* The r e a c t i o n (8. -> 3_) i n v o l v e s t h e r e m o v a l of the e l e m e n t s o f HC1 from adjacent carbons. The CHAIN and ATNAME commands ( F i g . 3 ) d e f i n e t h e site, w h i c h i s drawn for i l l u s t r a t i o n (Fig. 3)* The HRANGE command requires that there be from one to t h r e e h y d r o g e n atoms on atom 3 , w h i c h i s obviously necessary f o r the e l i m i n a t i o n o f HC1. (The p r o g r a m actually is c a p a b l e of determining this i t s e l f by examination of the t r a n s f o r m , so t h e HRANGE i n f o r m a t i o n i s r e d u n d a n t . ) The atom numbers are critical parameters for the reaction. T h e s e numbers a r e " s t i c k y " i n t h e s e n s e t h a t they w i l l always be a s s o c i a t e d w i t h the same a t o m s . Subsequent d e f i n i t i o n of the r e a c t i o n transform itself w i l l make e x p l i c i t r e f e r e n c e t o t h e s e atom n u m b e r s . Reaction Transform. The reaction transform i s the actual series of s t r u c t u r a l m o d i f i c a t i o n s which, when a p p l i e d t o t h e atoms i n the r e a c t i o n site, yield the products. The user defines the transformation explicitly by m o d i f i c a t i o n s to the p r e v i o u s l y named site. The TRANSFORM command (Fig. 3) r e s t o r e s the actual connection table representing that site. Then, again u s i n g EDITSTRUC commands, t h e m o d i f i c a t i o n s to t h a t s i t e which e x p r e s s the r e a c t i o n are defined. In the example ( F i g . 3 ) , the r e a c t i o n i n v o l v e s l o s s of HC1 yielding a d o u b l e bond (8. -> 5 . ) , e x p r e s s e d as UNJOIN (break the C-Cl b o n d ) and JOIN t o form the new bond. The DELATS command deletes the c h l o r i n e atom as an inconsequential product. Reaction Site Constraints * These constraints r e f e r to f e a t u r e s of the m o l e c u l e ( o t h e r than those i n the r e a c t i o n s i t e ) which a f f e c t the reaction, either positively by a l l o w i n g i t to o c c u r or n e g a t i v e l y by p r e v e n t i n g i t from o c c u r r i n g * T h e s e f e a t u r e s may be i n the l o c a l environment of the r e a c t i o n s i t e or may be r e m o t e as i n the case o f an i n t e r f e r i n g or competing functionality elsewhere in the molecule. In the example ( F i g * 3), we know that this reaction w i l l be hindered by t h e e x i s t i n g t . - b u t y l g r o u p i n the skeleton (J_0 ) . We previously defined a s u b s t r u c t u r e named, a r b i t r a r i l y , CCTBU as t h e s t r u c t u r e JJ_* Placing this s u b s t r u c t u r e on Β A D L I S T ^ · i s i n t e r p r e t e d by CONGEN as " c a r r y out the r e a c t i o n t r a n s f o r m at the s i t e g i v e n by

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

200 the reaction encountered".

site,

except

when

ORGANIC

CCTBU

SYNTHESIS

(JJL) i s

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

12

The SHOW command ( F i g , 3 ) p r e s e n t s t h e u s e r w i t h a complete summary of the reaction i n i t s current definition• Carrying

Out t h e R e a c t i o n *

Product Constraints* I n many c a s e s , c e r t a i n ways of c a r r y i n g o u t a r e a c t i o n which a r e l e g a l a c c o r d i n g t o d e f i n i t i o n s of the reaction s i t e , constraints and t h e transform y i e l d p r o d u c t s which a r e u n d e s i r e d * In the e x a m p l e ( F i g * 3 ) we wish to avoid formation of double bonds a t t h e b r i d g e h e a d s ( B r e d t ' s r u l e ) * We s u p p l y , a s a BADLIST c o n s t r a i n t , t h e name of a superatom c a l l e d BREDT w h i c h i s p r e v i o u s l y d e f i n e d a s s u b s t r u c t u r e 1 2* The s t a r r e d a t o m s r e p r e s e n t "linknodes" and a r e used t o represent a p a t h o f atoms o f a g i v e n l e n g t h o r range o f lengths* The u n s t a r r e d atoms i n substructure J_2 a r e the bridgeheads, the linknodes the three associated paths* The d o u b l e bond i n J _ 2 i s t o one o f t h e bridgehead atoms, completing an e x p r e s s i o n of the constraint• Applying t h e T r a n s form• Actual use of the reaction transform i s straightforward with the exception o f some f e a t u r e s t o a l l e v i a t e t h e p r o b l e m s o f duplication ( s e e below)* The p r o g r a m examines each structure i n the l i s t of structures t o which the r e a c t i o n i s applied f o r the presence o f the r e a c t i o n site. If a site(s) i s f o u n d , and t h e s t r u c t u r e obeys a l l r e a c t i o n c o n s t r a i n t s then t h e r e a c t i o n t r a n s f o r m i s a p p l i e d t o t h e s t r u c t u r e , once f o r each unique s i t e , and a p r o d u c t i s c r e a t e d f o r each application* Then, i f t h e user has s p e c i f i e d a multi-step reaction, the p r o d u c t m o l e c u l e may be t e s t e d a g a i n f o r the presence of a d d i t i o n a l r e a c t i o n s i t e s and t h e r e a c t i o n c a r r i e d out again* This e f f e c t i v e l y allows us t o e m u l a t e a reaction w h i c h has been c a r r i e d out with a specific

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

9.

VARKONY ET AL.

Computer-Assisted

Structure

Elucidation

201

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

molar r a t i o of r e a g e n t to starting material; i f only one mole o f r e a g e n t was used, the procedure can be stopped after a single application of a transform; alternatively it can be applied to completion (exhaustively)• C o n s i d e r the r e a c t i o n summarized i n F i g u r e 3 and i t s e f f e c t s on s t r u c t u r e s J_3. - J_5* I n s t r u c t u r e J_3. t h e r e a c t i o n s i t e m a t c h e s t w i c e , o n c e a t C-6,7, o n c e a t C7,8* The r e a c t i o n i s n o t c a r r i e d o u t , h o w e v e r , b e c a u s e both r e a c t i o n s i t e s v i o l a t e the undesired environment r e p r e s e n t e d by JJ_. F o r J_4, t h e r e a c t i o n s i t e matches once at C-6,12 and t h e reaction is carried out* But the product c o n s t r a i n t BREDT on B A D L I S T (J_2) r e j e c t s t h e B r e d t ' s r u l e v i o l a t o r J_6, r e s u l t i n g i n no p r o d u c t s for s t r u c t u r e _1_4* The r e a c t i o n s i t e f i t s t w i c e i n 15. at and C-4,5, and b o t h f i t t i n g s yield products, 17 and J_8, r e s p e c t i v e l y *

Duplication

Among

Products

of

a. R e a c t i o n *

When a reaction is a p p l i e d to a given list of structures, i t is frequently true that some p r o d u c t s t r u c t u r e s o c c u r many t i m e s i n t h e "raw" products l i s t * In mechanistic studies, this is the desired result because each occurrence of a product represents a unique r e a c t i o n pathway (see R e s u l t s and Discussion)* In s t r u c t u r e e l u c i d a t i o n s t u d i e s , though, the i m p o r t a n t information is the chemical identity of, not the p a t h w a y s t o , e a c h p r o d u c t , and i n s u c h applications i t is necessary to e l i m i n a t e d u p l i c a t e structures* This i s not a s i m p l e m a t t e r b e c a u s e a l t h o u g h structures are chemically equivalent their representations within the

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED

202

ORGANIC

SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

program may be different (.§_•&• , t h e atoms may be numbered differently from one representation of a structure to the next)* One method of duplicate e l i m i n a t i o n which a v o i d s c o s t l y atom-by-atom structure comparisons between a l l p a i r s of structures involves casting each product into a standard representation ("canonical f o r m " ) * Duplicates t h e n c a n be d e t e c t e d e a s i l y by d i r e c t comparison of these representations* The canonicalization process is relatively timec o n s u m i n g , t h o u g h , and i t i s d e s i r a b l e t o e x p l o r e more efficient methods of duplicate e l i m i n a t i o n wherever possible• One type o f d u p l i c a t i o n which c a n be d e t e c t e d without recourse to canonicalization is symmetry d u p l i c a t i o n , w h i c h c a n a r i s e e i t h e r when the r e a c t i o n itself p o s s e s s e s some symmetry o r when the s t a r t i n g structure i s symmetrical* Our g r a p h - m a t c h i n g algorithm which i s r e s p o n s i b l e for locating possible f i t t i n g s of t h e r e a c t i o n s i t e w i t h i n a m o l e c u l e t a k e s no a c c o u n t o f symmetry* F o r example, suppose the r e a c t i o n i s the a d d i t i o n of one mole of hydrogen t o an alkene* The r e a c t i o n s i t e h e r e i s J_2. a n d t h e t r a n s f o r m a t i o n i s J_2. -> 20»

c=c 1 2 IS

22

c-c 1 2 20

21

23

C-C 1 2 19

C—(C N)

\/ Ν

28

2 1 24

C=C 1 2 25

C-C—OH 1 2 26

C—C

V Ν

29

C—Ν

\/ Ν

30

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

9.

VARKONY E T AL.

Computer-Assisted

Structure

Elucidation

203

B e c a u s e a t o m s 1 and 2 a r e e q u i v a l e n t i n both the reaction s i t e J_2. and t h e transformed site ( . 2 0 . ) , the reaction has 2-fold symmetry* If the reacting structure i s 2J_, w h i c h i t s e l f has a t w o - f o l d symmetry p l a n e , then the f o u r m a t c h i n g s 22. - 23. a l l y i e l d t h e same product, cyclohexene* These f o u r matchings are members of an equivalence class determined by t h e symmetries of the r e a c t i o n and t h e reacting molecule* If the reaction were unsymmetrical, say with a transform of J_2_ - > 2 J > (this would be a hydration r e a c t i o n ) , t h e n t h e r e w o u l d be two equivalence classes among t h e m a t c h i n g s 2 _ 2 - 23.9 one c o n t a i n i n g 22 and 23 (each of t h e s e w o u l d l e a d t o e y e l o h e x e n - 3 - o l ) and one c o n t a i n i n g 23. and 2Ά (each y i e l d i n g eyelohexen-4-ol)* If the reacting molecule also had an u n s y m m e t r i c a l s t r u c t u r e , s a y £2, t h e n e a c h m a t c h i n g would constitute a separate one-element equivalence class, and f o u r d i s t i n c t s t r u c t u r e s would result* The g e n e r a l problem, then, i s to eliminate a l l b u t one member o f e a c h e q u i v a l e n c e c l a s s p r e s e n t i n t h e complete set of matchings* T h i s i s a form of the socalled double coset problem of combinatorial m a t h e m a t i c s w h i c h has b e e n d i s c u s s e d p r e v i o u s l y i n the context of c o n s t r u c t i v e graph l a b e l i n g . Our s o l u t i o n c o n s i s t s o f two p a r t s * F i r s t , b e f o r e the matchings are c a l c u l a t e d , a c r i t e r i o n i s d e f i n e d f o r o r d e r i n g any s e t of matchings* This criterion provides for the comparison of two matchings and, based upon the c o r r e s p o n d e n c e o f r e f e r e n c e numbers o f t h e a t o m s i n t h e r e a c t i n g s t r u c t u r e t o r e f e r e n c e numbers i n t h e r e a c t i o n site, d e f i n e s one matching to be " s m a l l e r " than the other* Second, as each matching is o b t a i n e d , the symmetry g r o u p s of b o t h t h e r e a c t i o n and the r e a c t i n g m o l e c u l e a r e used t o form a l l p o s s i b l e symmetry i m a g e s of the matching* I f t h e m a t c h i n g i s " s m a l l e r " t h a n any of these symmetry images, i t is kept as the r e p r e s e n t a t i v e of i t s e q u i v a l e n c e c l a s s * Otherwise i t is discarded as being a duplicate of some representative elsewhere in the complete set of matchings * The symmetry of the reacting molecule is a p r o p e r t y o f i t s s t r u c t u r e and c a n be c o m p u t e d p r i o r to the matching* The symmetry o f the r e a c t i o n depends u p o n t h e p r o p e r t i e s (.e.g.* , atom names, a l l o w a b l e r a n g e s o f h y d r o g e n s ) and i n t e r c o n n e c t i o n s of the " k e y " atoms in the reaction site, and upon the transform modifications* H e r e , "key" atoms a r e t h o s e atoms w h i c h are actually altered by the application of the

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

STRUCTURES

Figure 4. Development and pruning of a reaction sequence tree. Candidate structures are 31-39. Interconnecting Unes and the size of a structure convey information on the fate of each candidate. Broken lines pointing to small structures mean that the product(s) and its predecessor(s) are invalid and would be removed by constraints. Regular lines mean the product(s) and its associated candidate structure remain after reaction 1; medium lines connect products and associated structures which are viable after reaction 2; and heavy lines indicate the products from the one structure, 38, which survives after all constraints are applied.

CANDIDATE

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

9.

VARKONY E T A L .

Computer-Assisted

Structure

Elucidation

205

t r a n s f o r m to the s i t e as o p p o s e d t o atoms which are n e c e s s a r y t o , but do n o t d i r e c t l y p a r t i c i p a t e i n , the reaction* I n some c a s e s the p r o p e r t i e s of t h e s e key a t o m s , o r t h e o r d e r s o f t h e b o n d s b e t w e e n them, c o v e r a r a n g e o f p o s s i b i l i t i e s a s , f o r e x a m p l e , i n t h e use o f a special atom name which w i l l match any non-hydrogen atom, or of a bond o r d e r o f "any" which w i l l match a bond o f any multiplicity* In such cases it i s not a l w a y s p o s s i b l e t o c o m p u t e an o v e r a l l symmetry f o r the r e a c t i o n as a s e p a r a t e e n t i t y , b u t o n l y t h e symmetry i n the context of a particular matching* For example, c o n s i d e r the hypothetical reaction s i t e 2_8 where the "polyname" (C N) represents an atom which can be either C or N* This site really represents two p o s s i b i l i t i e s , 2_9. and 3.0., e a c h of which has two-fold symmetry. O n l y a f t e r a m a t c h i n g has b e e n obtained can it be determined which of the two possibilities p e r t a i n s and thus which symmetry i s appropriate. In these cases i t i s p o s s i b l e at l e a s t to define, before matching, a set of p o s s i b l e r e a c t i o n symmetries which may be applicable* Then for each matching i t is necessary only to make t h e a p p r o p r i a t e s e l e c t i o n from this set, not to recompute completely the reaction symmetry * Development. Sequence Tree•

Indexing

and

Pruning

of

the

Reaction

A reaction sequence may be of arbitrary complexity* A convenient representation for describing a r e a c t i o n sequence i s a t r e e s t r u c t u r e * We illustrate the d e v e l o p m e n t and i n d e x i n g of a r e a c t i o n sequence tree in Figure 4 * We assume f o r this example t h a t there are nine candidate s t r u c t u r e s ( _ 3 J _ - 33.) f o r an unknown w h i c h i s a 1 , 1 ,-eyeloheptane diol, possessing no g e m - d i o l f u n c t i o n a l i t y . In the example (Fig* 4 ) we present the results (and their ultimate c o n s e q u e n c e s ) o f t h e a p p l i c a t i o n o f two r e a c t i o n s in a s t e p w i s e manner, a single-step oxidation (reaction 1 ) followed by a dehydration (reaction 2 ) * A third reaction, exhaustive dehydration, is also a p p l i e d to t h e s e t o f c a n d i d a t e s t r u c t u r e s (3A. - 3 9 ) * A r e a c t i o n sequence t r e e has several important features* B r a n c h i n g of the t r e e o c c u r s w h e n e v e r more than one reaction is applied to a single set of s t r u c t u r e s (or products), e..jg.* , r e a c t i o n s 1 and 3 , F i g * 4* There is not necessarily a one-to-one correspondence of s t r u c t u r e s to products* First, a g i v e n r e a c t i o n may p r o d u c e more t h a n one p r o d u c t f r o m a

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

206

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

given structure, e i t h e r : 1) b e c a u s e t h e r e a c t i o n s i t e applies more t h a n once to the s t r u c t u r e (§.*&* > two products a r e produced from ϋ , 3Λ a n d 3_6 39 by reaction 1 ) ; o r 2) because i t i s a fragmentation reaction* S e c o n d , i t may be p o s s i b l e to obtain the same p r o d u c t f r o m two d i f f e r e n t s t r u c t u r e s , 61 . 62 a n d .64 a r e e a c h produced from t h e same r e a c t i o n a p p l i e d t o two d i f f e r e n t s t r u c t u r e s , J*i a n d j>J_, Hfi a n d 54. and 4_0 and respectively)* It i s possible to develop the complete r e a c t i o n s e q u e n c e t r e e by a p p l y i n g a p l a n n e d s e r i e s o f r e a c t i o n s t o t h e c a n d i d a t e s t r u c t u r e s b e f o r e any l a b o r a t o r y work i s a c t u a l l y done* In r e a l a p p l i c a t i o n s , however, t h e tree would be developed in a stepwise manner by c a r r y i n g out a r e a c t i o n i n the laboratory, acquiring data on t h e p r o d u c t s a n d then t u r n i n g t o CONGEN t o explore the implications of this information* We a t t e m p t t o i l l u s t r a t e what i s a v e r y dynamic p r o c e s s w i t h t h e s t a t i c f o r m o f F i g u r e 4* Each r e a c t i o n y i e l d s a s e t o f p r o d u c t s which a r e indexed by pointers to t h e i r precursors* These pointers are maintained a t each step i n the e x p a n s i o n of the t r e e so t h a t i n f o r m a t i o n (constraints) applied to products a t any level automatically results i n appropriate a c t i o n ( p r u n i n g away undesired structures) a t a l l l e v e l s below and above t h e g i v e n level* There a r e s e v e r a l types of c o n s t r a i n t s which can be a p p l i e d t o s t r u c t u r e s i n t h e r e a c t i o n s e q u e n c e t r e e * One constraint is a minimum t o maximum number o f products* In the l a b o r a t o r y , o x i d a t i o n ( r e a c t i o n 1) of t h e unknown structure yielded two structures* Applying the oxidation to the s e t of candidate structures (3_1 - 3.2.) y i e l d s two products from each s t r u c t u r e e x c e p t ^ 2 , 23. a n d 23.9 w h i c h are, therefore, r e j e c t e d a s c a n d i d a t e s by CONGEN* ( T h e s t r u c t u r e s w h i c h a r e s i n g l e p r o d u c t s o f ^2., 23 a n d 23 p r o d u c e d by CONGEN p r i o r t o a p p l i c a t i o n o f t h e c o n s t r a i n t a r e .42., 4jJ a n d 46. respectively*) W i t h no further c o n s t r a i n t s , the remaining candidate structures are s t i l l viable* 1 2

Any of the e x i s t i n g structural constraints i n CONGEN c a n be a p p l i e d t o p r o d u c t s a t a n y s t e p * In the laboratory, t h e two products of reaction 1 were s e p a r a t e d and e a c h s u b j e c t e d to a dehydration (reaction 2)* A major component o b t a i n e d from e a c h p r o d u c t was an α , g-unsaturated ketone* Applying a GOODLIST" constraint expressing this observation, structures -

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

9.

VARKONY E T AL.

Computer-Assisted

Structure

Elucidation

207

60 a r e p r u n e d away by CONGEN, l e a v i n g o n l y the α , 3 u n s a t u r a t e d k e t o n e s 6J_ - _6_4• This pruning also r e s u l t s i n the r e j e c t i o n of products 4 J _ , Jj_4, A £ > Hi and JJ3 a t the p r e v i o u s l e v e l because they d i d not y i e l d α , 3 unsaturated ketones* Rejecting these leads, i n turn, t o r e j e c t i o n o f 2A> 2Λ a n d 3J5 candidate structures because t h e i r products of r e a c t i o n 1 d i d not both y i e l d an ot , 3 - u n s a t u r a t e d k e t o n e * T h i s l e a v e s o n l y 21 - 23 as c a n d i d a t e s t r u c t u r e s *

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

a

s

R e a c t i o n s c a n a l s o be c a r r i e d o u t e x h a u s t i v e l y by r e p e t i t i v e a p p l i c a t i o n o f the r e a c t i o n u n t i l there are no more r e a c t i o n s i t e s r e m a i n i n g i n t h e m o l e c u l e . This is illustrated i n t h e example by reaction 3, an exhaustive dehydration* In the laboratory, this reaction yielded three different dienes* Without further elaboration of the s t r u c t u r e s of these products, the correct s t r u c t u r e c a n be a s s i g n e d a s 2Â b e c a u s e 21 and 23 y i e l d o n l y one p r o d u c t ( t h e same o n e , 66) * I n c a r r y i n g o u t t h e r e a c t i o n w i t h CONGEN, 6_5. and 67 a r e r e j e c t e d by a BADLIST constraint forbidding aliènes* P r o d u c t s J56>, 6_8 and 6_2. a r e p r o d u c e d from 3 8 . the f i n a l s t r u c t u r e . When reactions are r e l a t i v e l y simple and w e l l u n d e r s t o o d , t h e k i n d o f p r u n i n g d e s c r i b e d above c a n be used* As l o n g a s one has c o n f i d e n c e i n t h e r e a c t i o n p r o c e e d i n g as d e f i n e d , t h e n one c a n u s e t h e p r e d i c t e d r e s u l t s a s p o w e r f u l c o n s t r a i n t s on t h e p r o d u c t s and a l l i n t e r r e l a t e d s t r u c t u r e s i n the tree* Otherwise, there would be no grounds f o r r e j e c t i n g a structure* A c h a r a c t e r i s t i c o f m e c h a n i s t i c s t u d i e s , however, i s t h a t t h e g e n e r a l d i r e c t i o n o f t h e r e a c t i o n i s known, but i n insufficient detail to rule out a m u l t i p l i c i t y of products* O t h e r w i s e one could p r e d i c t the products a p r i o r i and t h e r e w o u l d be no p r o b l e m * In a d d i t i o n , of c o u r s e , one b e g i n s w i t h a s i n g l e , known s t r u c t u r e and i t i s nonsense t o use the p r u n i n g mechanism described above. When we p r u n e t h e r e a c t i o n s e q u e n c e t r e e for a m e c h a n i s t i c problem we p r u n e o u t i n d i v i d u a l p a t h w a y s * Rejection of a p a r t i c u l a r product i n the tree prunes away a l l s t r u c t u r e s w h i c h p o i n t o n l y t o i t o r t o w h i c h only i t points. I f a s t r u c t u r e has a n o t h e r source or an a l t e r n a t i v e f a t e , i t i s r e t a i n e d . In the process o f f o c u s i n g i n on t h e s t r u c t u r e s o f unknown products of c y c l i z a t i o n s o r r e a r r a n g e m e n t s we a l s o f o c u s i n on t h e p o s s i b l e pathways o f f o r m a t i o n .

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

208 RESULTS AND

DISCUSSION

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

A) Ln Example of Application of Reaction Sequences to M e c h a n i s t i c Problems• There are at l e a s t two ways t o a p p l y t h e r e a c t i o n s e q u e n c e c a p a b i l i t i e s o f CONGEN t o m e c h a n i s t i c problems* Some o f t h e p r o c e d u r e s d i s c u s s e d s u b s e q u e n t l y c a n be c a r r i e d o u t with current computer-assisted s y n t h e s i s programs whenever a single compound r e p r e s e n t s t h e s t a r t i n g p o i n t * One a p p l i c a t i o n of reaction sequences i n v o l v e s detailed mechanistic s t u d i e s of p o s s i b l e r e a r r a n g e m e n t s of a particular compound* If a mechanism is t o be e l u c i d a t e d i n d e t a i l , i t i s i n s u f f i c i e n t t o know m e r e l y that one compound can c o n v e r t to another* One must a l s o know the i d e n t i t y of e a c h atom i n v o l v e d and i t s fate i n the reaction. This i s normally followed by tagging v a r i o u s atoms i n the starting m a t e r i a l with isotopic or s u b s t i t u e n t labels* This requirement i s t r a n s l a t e d i n the computer program to the f a c i l i t y for retaining structures ( a t t h e same level in the tree) which are f o r m a l l y d u p l i c a t e s but which are in fact d i f f e r e n t i n terms of the numbering of the atoms ( s e e Methods s e c t i o n ) * U s i n g t h i s a p p r o a c h the f a t e of each atom at each s t e p of a sequence c a n be traced* We think that this c a p a b i l i t y w i l l help a user to d e s i g n the best places for l a b e l l i n g the starting material, b a s e d on t h e c o m p u t e r ' s s i m u l a t i o n of the c o u r s e of a reaction* C o n s i d e r , as a b r i e f e x a m p l e , possible 1,2alkyl shifts in structure J_0, under constraints forbidding formation of 3 and 4 membered r i n g s and methyl groups* Although there are only three new structures (in the absence of l a b e l l i n g ) produced i n the first st^p, there are eight different ways o f p e r f o r m i n g the s h i f t to y i e l d four pairs of f o r m a l l y e q u i v a l e n t s t r u c t u r e s , two o f w h i c h , 70a and 70b. are formally t h e same as t h e starting material (10.) but have different numberings* Each of the three new skeletons appears as a pair of s t r u c t u r e s (71 a.b. 72a , b . and 73a b ) * The members o f e a c h p a i r could be d i s t i n g u i s h e d by a p p r o p r i a t e l a b e l l i n g o f £0* Because there i s a one-to-one c o r r e s p o n d e n c e between t h e atom numbers o f t h e p r o d u c t s and t h e s t a r t i n g material, 70 « it is simple to visualize the course of each rearrangement * t

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

9.

VARKONY ET

AL.

Computer-Assisted

Structure

Elucidation

209

Figure 5. The eight structures, unique in terms of the numbering of their atoms, are produced by a single 1,2-alkyl shift of 70. Transitions involving bridgehead carbonium ions were allowed, but three- and four-membered rings were forbidden.

A second a p p l i c a t i o n of reaction sequences to mechanistic s t u d i e s i n v o l v e s r e a c t i o n s where one needs to explore possible products and interconversion pathways, but without regard to preserving identities o f atoms. An e x a m p l e of t h i s type of r e a c t i o n i s the c l a s s i c problem of the i n t e r c o n v e r s i o n of isomers of 10 16 adamantane. This problem has been the s u b j e c t of s e v e r a l recent a r t i c l e s using computer-based approaches to help e l u c i d a t e the course of v a r i o u s interconversions 3 . A characteristic of such problems i s that they i n v o l v e r e a c t i o n s which are run to completion because the reaction and associated c o n d i t i o n s do n o t a l l o w s t o p p i n g the r e a c t i o n after a p r e c i s e number o f s t e p s have t a k e n place. Generally, c y c l i z a t i o n s take place u n t i l further plausible sites for cyclization are exhausted; rearrangements take place until there is no change in the r a t i o s of products. Reaction c a p a b i l i t i e s o f CONGEN can model such reactions. Cyclizations usually involve only a s m a l l number o f s t e p s , so this represents no s p e c i a l problem. Rearrangements, however, can proceed i n d e f i n i t e l y i f no s t o p p i n g c o n d i t i o n i s s p e c i f i e d . In the program we carry reactions to completion by a p p l y i n g t h e r e a c t i o n one s t e p a t a t i m e , s t o p p i n g when no new structures, compared to a l l those produced p r e v i o u s l y , are encountered. Any f u r t h e r s t e p s would be c i r c u l a r and y i e l d no new p r o d u c t s or pathways. C

H

t

o

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Figure

6.

Complete interconversion bridgehead carbonium

map for C10H16 ring systems ions (solid lines); pathways

devoid of three- and four-memhered rings. Pathways involving bridgehead carbonium ions (dashed lines).

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

involving

no

to

GO

V3

ΚΛ

Ω

hH

il

C/î

>

ι

ο S

Ο

h-» Ο

9.

VARKONY E T A L .

Computer-Assisted

Structure

Elucidation

211

Because f i v e structures ( i l , U , U> - i 8 ) were missing from t h e o r i g i n a l s e t o f adamantane i s o m e r s considered by W h i t l o c k and Siefkin ^, completion of t h e i r i n t e r c o n v e r s i o n map r e p r e s e n t s a good e x a m p l e f o r a mechanistic application. Although i t i s possible to do t h i s problem as o u t l i n e d above, b e g i n n i n g with a specific precursor and running the reaction to completion, i n fact t h e r e i s a much simpler way t o develop the complete i n t e r c o n v e r s i o n map* Under t h e structural constraints p r e s e n t e d ^, there a r e 21 possible isomers • Whenever the complete set of possibilities is available, the complete i n t e r c o n v e r s i o n map c a n be g e n e r a t e d by subjecting a l l p o s s i b i l i t i e s (21 i s o m e r s ) t o one s t e p o f t h e r e a c t i o n , i n t h i s case a 1,2-alkyl s h i f t * The r e v e r s e reaction i s i m p l i c i t i n t h i s s t e p and a l l p o s s i b l e pathways from one s t r u c t u r e t o o t h e r s a r e e s t a b l i s h e d * The c o m p l e t e i n t e r c o n v e r s i o n map i s shown i n F i g u r e 6* 5

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

1

1

Earlier work ^ demonstrated that conversion of tetrahydrodicyclopentadiene ( i O ) t o a d a m a n t a n e (_8Z_) was not p o s s i b l e w i t h o u t i n v o k i n g f o r m a t i o n o f a b r i d g e h e a d carbonium i o n . The e x i s t e n c e o f a d d i t i o n a l s t r u c t u r e s ( 7 1 , 74. 7 6 - 7 8 ) w h i c h l i e i n t h e p a t h o f c o n v e r s i o n o f 70 t o a d a m a n t a n e (£2_) meant t h a t t h i s q u e s t i o n must be reinvestigated* We c a r r i e d o u t t h e a b o v e r e a r r a n g e m e n t reaction under the c o n s t r a i n t s o f no formation of bridgehead carbonium ions, using as d e f i n i t i o n s o f bridgeheads those s e l e c t e d p r e v i o u s l y ^* The r e s u l t s are depicted i n Figure 6* We o b t a i n a f o u r component map i f t h e d a s h e d l i n e s ( p a t h w a y s i n v o l v i n g bridgehead c a r b o n i u m i o n s ) a r e removed* One c o m p o n e n t i s i8., t h e second i s U , t h e t h i r d i s 10 - Β a n d J3_ - ϋ , and t h e fourth is ϋ - 5_0* Although our complete r e s u l t s i n d i c a t e t h a t t h e r e a r e a l t e r n a t i v e pathways from i O t o a d a m a n t a n e (R7) n o t c o n s i d e r e d p r e v i o u s l y , the conclusion o f W h i t l o c k and S i e f k i n ^ t h a t a t l e a s t one b r i d g e h e a d carbonium i o n i s required f o r the conversion, under the given s t r u c t u r a l constraints, i s v e r i f i e d * 1

B* An Example of A p p l i c a t i o n of Reaction Sequences to a Structure Elucidation P r o b l e m * The structure elucidation of c o r i o l i n (whose p r o p o s e d structure is 2J_) , a sesquiterpene antibiotic, represents a problem where structural information i n f e r r e d from c h e m i c a l r e a c t i o n s p l a y e d a crucial role in the tentative s o l u t i o n * Although i t i s possible i n this case to translate a l l structural inferences d e r i v e d from o b s e r v a t i o n s on t h e r e a c t i o n p r o d u c t s b a c k to constraints on the complete set of s t r u c t u r a l

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

212

possibilities, i t is difficult to do using the c o n s t r a i n t s mechanism i n CONGEN* In fact i t i s much s i m p l e r and much more i n t u i t i v e , c h e m i c a l l y , t o use the reaction sequence f e a t u r e s o f CONGEN to e x p r e s s the reaction, obtain products, test the products with constraints and have the program automatically d e t e r m i n e w h i c h c a n d i d a t e s t r u c t u r e s a r e p l a u s i b l e as a result. Extensive spectroscopic data revealed that coriolin has an empirical formula ^15^20^5 * is c o m p o s e d o f f i v e s t r u c t u r a l f r a g m e n t s , jj_2 These fragments comprise a l l o f the atoms i n the empirical formula, so that the f r e e valences (bonds w i t h an u n s p e c i f i e d t e r m i n u s ^ ) o f 5_2 - 3_6 must a l l be c o n n e c t e d t o o t h e r , n o n - h y d r o g e n atoms* a n c

The s t r u c t u r a l a n a l y s i s o f c o r i o l i n u s i n g CONGEN and reaction sequence information provides some interesting examples of the different ways both s u b s t r u c t u r a l and c h e m i c a l i n f e r e n c e s can be utilized to help solve the problem* If chemical experiments have a l r e a d y been c a r r i e d out i n the laboratory, then f r e q u e n t l y some o f t h e i n f e r e n c e s c a n be u s e d i n CONGEN wjiile c o n s t r u c t i n g structures* For example, b a s e d on superatoms 2_6 with the constraint of no additional multiple bonds, t h e r e are more than 800 possible structures* But the chemical evidence'^ r e v e a l s t h a t t h e s t r u c t u r e p o s s e s s e s a t l e a s t one four, one f i v e and one s i x membered r i n g * Even though the precise environment of these rings cannot e a s i l y be s p e c i f i e d u n t i l the r e a c t i o n sequence i s c a r r i e d out i n CONGEN, t h e number o f each r i n g of each s i z e can be used as a constraint, resulting in 56 structural candidates p r i o r to chemical r e a c t i o n s * The NMR data do n o t r e v e a l the presence of c y c l o p r o p y l hydrogens* This constraint further reduces the number of p o s s i b i l i t i e s to 52* 9

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

9.

VARKONY E T A L .

In reactions H CORIOLIN

Computer-Assisted

Structure

the laboratory the was c a r r i e d o u t :

Elucidation

following

213 sequence

of

L i A l H j|

2

>DIHYDROCORIOLIN

>HEXAHYDR OCORIOL IN

i

CrO

ι

V triketone ο

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

/

3

OH \

CH—C 1 °-° 2 97

A

CH—CH 1

ζ>

°2

98

The first reaction reduced the ketone f u n c t i o n a l i t y i n c o r i o l i n ( 5 J . ) t o an a l c o h o l . Carrying out this reaction i n CONGEN y i e l d s t h e e x p e c t e d 52 products. The s e c o n d r e a c t i o n o p e n e d t h e two e p o x i d e f u n c t i o n a l i t i e s , y i e l d i n g two new h y d r o x y l g r o u p s , b o t h of which a r e t e r t i a r y . T h i s a l l o w s a r e a c t i o n s i t e and t r a n s f o r m t o be d e f i n e d w h i c h e x p r e s s t h i s observation, 2J. -> 23.) a more e f f i c i e n t p r o c e d u r e t h a n o p e n i n g b o t h e p o x i d e s b o t h ways f o l l o w e d by p r u n i n g t h e p r o d u c t list. The HRANGE r e s t r i c t i o n , ( i * e _ . , no hydrogens, or H Q _ ) , on atom 1 o f 3J_ r e s u l t s i n f o r c i n g the epoxide t o open to a t e r t i a r y a l c o h o l (£8.)» The e x p e c t e d 52 p r o d u c t s a r e o b t a i n e d i n CONGEN. The final reaction r e s u l t e d i n oxidation of the three secondary a l c o h o l functionalities to keto groups. Spectroscopic data suggested that one o f t h e k e t o g r o u p s was i n a f o u r membered r i n g , one i n a f i v e and one i n a s i x membered ring. This constraint c a n be i n v o k e d on the o r i g i n a l c a n d i d a t e s t r u c t u r e s f o r c o r i o l i n o n l y by a c o m p l i c a t e d case a n a l y s i s on the possible environment(s) of the o r i g i n a l ketone f u n c t i o n a l i t y . As a c o n s t r a i n t on t h e products of the f i n a l oxidation, however, i t isa straightforward t e s t whose r a m i f i c a t i o n s i n terms o f s t r u c t u r a l candidates are determined a u t o m a t i c a l l y by CONGEN. This r e a c t i o n , plus constraints, l e a v e s 20 s t r u c t u r a l candidates, the proposed s t r u c t u r e (JLL) A N D 19 others. We have examined these structures (automatically, using CONGEN) f o r the presence of > 0

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

214

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

naturally-occurring t r i c y c l i c sesquiterpane skeletons ' b e c a u s e i t was t h i s r e a s o n i n g by a n a l o g y w h i c h l e d to t h e p r o p o s a l o f 5_1 for coriolin. S t r u c t u r e 5_L is t h e only one o f the c a n d i d a t e s which possesses a known skeleton• In the group o f 20 s t r u c t u r e s there are f i v e , i n c l u d i n g 5J_, w h i c h obey a h e a d - t o - t a i l isoprene rule* The other four structures are - 102 * We h a v e recently investigated the scope of isomerism of terpenoid systems and find many examples i n the literature where additional structural possibilities exist b u t where structural assignment i s b a s e d on a n a l o g y w i t h known s y s t e m s * A l t h o u g h t h e r e may be good reasons f o r using a n a l o g y , no new terpenoid skeletons w i l l be d i s c o v e r e d t h i s way*

CONCLUSIONS We h a v e p r e s e n t e d an a p p r o a c h w h i c h i s c a p a b l e o f emulating many of the laboratory applications of sequences of chemical reactions* Integrated with the c a p a b i l i t i e s o f t h e CONGEN p r o g r a m t o s u g g e s t sets of candidate structures f o r an unknown compound, this a p p r o a c h has s i g n i f i c a n t l y i n c r e a s e d t h e power of the program to assist chemists in solving structure elucidation problems* We have used some b r i e f but i l l u s t r a t i v e e x a m p l e s t o show d i f f e r e n t a p p l i c a t i o n s o f r e a c t i o n sequences to both m e c h a n i s t i c and s t r u c t u r a l problems. As c o m p u t e r programs to aid i n structure elucidation develop further capabilities and become more widely available, we feel that they w i l l be utilized exactly as o t h e r analytical tools a r e used* CONGEN p r o v i d e s a means f o r v e r i f y i n g hypotheses about unknown s t r u c t u r e s and suggesting a l t e r n a t i v e s which m i g h t o t h e r w i s e be o v e r l o o k e d *

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

9. VARKONY ET A L .

Computer-Assisted Structure Elucidation

215

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

Our inability to utilize stereochemical i n f o r m a t i o n i s a shortcoming, but i t i s not a severe problem f o r many applications of reaction sequences. Most c h e m i c a l r e a c t i o n s u t i l i z e d t o p r o v i d e a d d i t i o n a l s t r u c t u r a l , as opposed t o s t e r e o c h e m i c a l , i n f o r m a t i o n a r e d e s i g n e d t o have broad a p p l i c a t i o n . The r e a c t i o n s must a p p l y i n a v a r i e t y o f p o s s i b l e s i t u a t i o n s because the environments of the f u n c t i o n a l i t i e s involved are u s u a l l y not p r e c i s e l y d e f i n e d . In addition, detailed relative or absolute stereochemical information i s seldom available until considerable detail of the structure i s known; r e a c t i o n s cannot under these c i r c u m s t a n c e s be s e n s i t i v e t o s t e r e o c h e m i s t r y .

LITERATURE CITED Part XXIII o f t h e series "Applications of Artificial Intelligence for Chemical Inference". F o r Part XXII s e e B.G. B u c h a n a n , D.H. S m i t h W.C. W h i t e , R . J . Gritter, E.A. Feigenbaum, J . Lederberg, and C. Djerassi, J. Amer. Chem. S o c . , in press. 2) We w i s h t o t h a n k t h e National Institutes of Health ( R R 0 0 6 1 2 - 0 5 ) for their g e n e r o u s financial support of t h e SUMEX c o m p u t i n g resource ( R R 0 0 7 8 5 - 0 3 ) on w h i c h CONGEN was d e v e l o p e d a n d is made available to outside users. 3) R.E. Carhart, D.H. S m i t h , H. Brown and C. Djerassi, J. Amer. Chem. Soc., 97, 5755 ( 1 9 7 5 ) . 4) E . J . C o r e y and W.T. W i p k e , Science, 166, 178 ( 1 9 6 9 ) . 5) " C a r b o n i u m I o n s " , Vol. I-IV, G.A. O l a h a n d P.v.R. Schleyer, eds., Wiley Interscience, New Y o r k , N.Y. (1968-1973). 6) E.E. v a n T a m e l e n , Accts. Chem. R e s . , 8, 152 ( 1 9 7 5 ) . 7) A d o c u m e n t describing t h e CONGEN program, including EDITSTRUC and all its associated structure manipulation capabilities, is available from t h e authors. 8) R.E. Carhart a n d D.H. S m i t h , C o m p u t e r s in Chemistry, in press. 9) W.T. Wipke and T.M. D y o t t , J. Amer. Chem. Soc., 96, 4825 ( 1 9 7 4 ) . 10) H.L. M o r g a n , J . Chem. Doc., 107 ( 1 9 6 5 ) . 11) L.M. Masinter, N.S. Sridharan, R.E. Carhart and D.H. S m i t h , J . Amer. Chem. Soc., 96, 7714 ( 1 9 7 4 ) . 12) This example is fictional and is used for illustrative purposes only. There are probably easier ways to distinguish the candidate structures in practice. Only structural isomers, as opposed to geometric isomers are presented. 1)

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

216 13)

COMPUTER-ASSISTED ORGANIC

H.W.

Whitlock

Soc., 90, 4929

and

M.W.

Siefken,

J.

Amer.

SYNTHESIS

Chem.

(1968).

14) E.M. Engler, M. Farcasiu, A. P.v.R. Schleyer, J . Amer.

Sevin, Chem.

J.M. Cense, and Soc., 95, 5769

(1973). 15)

R.E. Carhart, Sridharan, J.

D.H. Chem.

Smith, H. I n f . Comp.

Brown, Sci.,

and N.S. 15, 124

(1975).

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch009

16) S. T a k a h a s h i , H. Iunuma, T. Takita, Κ. Maeda, a n d H. Umezawa, T e t . Lett., 4663 (1969). 17) T.K. D e v o n and A . I . Scott, "Handbook of Naturally Occurring Compounds. V o l . II. T e r p e n e s , " A c a d e m i c Press, I n c . , New Y o r k , N.Y., 1972. 18) D.H. S m i t h and R.E. Carhart, Tetrahedron, in press.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10 Computerized Aids to Organic Synthesis in a Pharmaceutical Research Company D. R. EAKIN and W. A. WARR

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

Data Services Section, Imperial Chemical Industries Ltd., Pharmaceuticals Div., P.O. Box 25, Alderley Park, Macclesfield, Cheshire, England SK10 4TG

ICI Pharmaceuticals Division has been investigating the area of computerised aids to Organic Synthesis f o r many years. Our approach has been largely pragmatic and d i f f e r s from other systems i n two main areas. F i r s t l y , we have assumed the research chemist is the best innovator i n h i s specialised area we therefore want to provide facilities which will support h i s i n t e l l e c t u a l c a p a b i l i t i e s , hopefully by removing some of the hit and miss aspects of reaction synthesis planning by l e t t i n g computers do some of the more tedious operations. The second difference is our starting point. We felt many of the techniqu­ es already developed f o r the computerised manipulation of compounds should be capable of further exploitation in this area. In the process of designing a potential drug, a pharmaceut­ ical research chemist will have to make use of chemical information systems at various stages. He will have to f i n d answers to such questions as: Is my idea novel - is anyone else workinginthe same or similar areas? How can I make compounds of this type? What starting materials are available to prepare compounds of this type? This compound shows a c t i v i t y , what compounds o f s i m i l a r types are available in quantities large enough to test? Most in-house company information systems w i l l provide some computerised aids to help chemists with these types o f problem, mainly f a l l i n g into one of two classes. F i r s t l y , text searching f a c i l i t i e s on the general l i t e r a t u r e , patent 217 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

218

COMPUTER-ASSISTED ORGANIC SYNTHESIS

information, company reports, etc. enabling the chemist to eval­ uate the significance of an idea. In most systems, the chemist obtains a l i s t of, hopefully, pertinent references which he must then pursue and evaluate. Secondly, most compan­ ies provide some substructure search f a c i l i t i e s on their own compound files enabling the chemist to look for suitable starting materials or for compounds for biological evaluation.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

We, at ICI, have tried to extend these f a c i l i t i e s so that the chemists can readily obtain physical samples of compounds whether for synthetic purposes or not. A chemists chosen synthetic route often depends on the availability of intermed­ iates and we have tried to expose our chemists to a wide range of available coimpounds which are also readily accessible. In pharmaceutical research, chemists generally know what they want to make and often how they want to make i t . and so a good supply of compounds, whether from compound stocks or commercial suppliers, becomes a valuable support f a c i l i t y . However, the chemist i s s t i l l faced with problems i n reaction synthesis. In many cases, he cannot readily find answers to questions such as: How was A made? How can I make compounds similar to A? What examples have we of the type of reactions that make A? The chemist knows the product he wishes to make and must work backwards from there. He has various approaches open to him. Firstly, he can use conventional chemical information f a c i l i t i e s such as Chemical Abstracts. Beginning with the product he wishes to make, he can examine the name and molecular formula indexes to find papers relevant to his prod­ uct. By examining the abstract, he can find information on whether the compound was made by the author, and i f so, how. He must, however, always look for a specific compound and may involve himself i n a very time-consuming task. His second approach i s to recall a possible method of preparation and to conceive required molecular structures i n terms of pathways through known or even unknown precursors. Having decided on a chosen pathway, he f i r s t examines standard textbooks to find how a certain transformation may be achieved given the constraints imposed by his particular molecule. His next step i s to examine the availability of the starting material; i f necessary extending his pathway back until a suitable starting material i s found.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10.

EAKiN

AND

WARR

Computerized

Aids to Organic

Synthesis

219

We at ICI Pharmaceuticals Division have been looking at ways i n which computerised chemical mforaiation systems can be used to help chemists plan reaction syntheses more easily. We feel the chemist needs more automatic ways of finding inform­ ation on: (1) Reaction pathways. (2) Reactant and product, particularly at the compound class level.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

We felt extensions to the existing CROSSBOW f a c i l i t i e s (1-5) could easily cope with the second factor, providing the correct data bases were available. We therefore set about examining ways of coding reaction pathways such that they could be analysed within a computer system. THE TOTAL DESCRIPTION APPROACH METHODOLOGY. Our f i r s t approach was to take as wide a variety of reactions as possible and code as much data about them as available, and then to draw conclusions as to how much of this data was in fact needed to satisfy user demands. We aimed to avoid applying chemical mechanistic knowledge when analysing a reaction, such that our analyses would be independ­ ent of the chemical knowledge of the encoder. To test our ideas, we chose about 700 literature-based reactions from a standard reference work (6) and recorded a wide variety of information for each reaction: (1) Compound details for reactants and products. WLN was used so that standard CROSSBOW f a c i l i t i e s could be used. (2) Bibliographic details, so that the chemists could locate the originating paper for f u l l details. (3) Reaction conditions, including information on reagents, catalysts, solvents, time, temperature, yield and so on. A dictionary of agreed abbreviat­ ions was set-up to code this type of data i n an essentially free text form. (4) Details of the reaction site, using a variety of coding systems. REACTION ANALYSIS. The reaction site was defined as the bonds formed or broken i n a reaction and the atoms which these bonds connect. Thus, in: 0

0

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

220

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

i t i s assumed that the O^CH^ bond i s broken, and our supposition would be unchanged even i f Ο*** labelling experiments showed that the COO bond were broken. We manually coded the following information for each reaction: (a) A summary reaction classification. The classific­ ation depends upon bonds and rings formed and was based on that used at Roussel-Uclaf (7). The twelve classes were not mutually exclusive and a reaction may f a l l into more than one class. For example, the Fischer Indole Synthesis:

This falls into two classes - one indicating the formation of the ring and the second the formation of the carbon-carbon atom. (b) Details of bonds broken and formed, excluding atomto-hydrogen bonds. No account i s taken of bonds which are merely modified, for example, carbon-carbon triple bonds result­ ing from carbon-carbon double bonds. Consider the reaction: 0

H

0

The carbon-carbon double bond across the ring fusion i s broken and two carbon-oxygen double bonds are formed. The bond inform­ ation would be coded as: -1C=C:+2C=0 (c) Details of individual rings broken and formed. Thus, the rings broken i n the above reaction would be expressed as: -(-55 BM and the ring formed as: + C-8VM EV The coding system i s based on modifications of WLN. The hyphen indicates that the ring i n question i s part of a larger ring system.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10. E A K i N A N D WARR

Computerized

Aids to Organic

Synthesis

221

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

(d) Details of the reaction centre. Again, a system of coding was necessary to define the reaction centre, and again we used WLN with two basic modifications. Firstly, we needed to represent individual carbon atoms and we used a variety of symbols to represent the various alternatives. Secondly, we needed to indicate whether each atom was terminal, linking or branched. Using the same reaction, the reaction centre would be represented as: DD = &0Α/&0Α where "D" represents an unsaturated carbon atom at a ring fusion junction; "A" represents the ring carbon atom with an exocyclic double bond; "&" indicates the terminality of the oxygen atoms; "/" indicates the fragments are i n the same molecule (8). ASSESSMENT OF THE APPROACH. The obvious advantages of this approach l i e in the large amount of data held for each reaction, and the precise statement of the reaction site. We tried to avoid the need for subjective chemical judgements either i n coding or searching the f i l e . This i s not to say that chemical problems did not arise. We found that problems arose with tautomerism and mesomeric systems and with defining the reaction sites in certain re-arrangements. However, the reaction coding involved a substantial amount of labour and we had now to ascertain the value of this, and hence, the cost-effectiveness of the approach. The f i r s t stage was to compare the effectiveness of the system with the less labour-intensive system developed at Sheffield University (9, 10). In the latter case the recognit­ ion and generation of the reaction site is carried out by computer. A l l reactants and products involved in the reaction are input, and an algorithm analyses automatically the differen­ ce between reaction and product at the level of the small bondcentred fragments. The fragments which are found to be different are then re-assembled to give a skeletal reaction scheme. Six hundred of the reactions analysed manually were subjected to the automatic analysis of reactants and products. The automatic analysis dealt with 84$, but also suffered from some faulty analyses, particularly with acylation reactions. This was partly alleviated by using an extra routine in the analysis, although this produced further defective analyses. Also sufficient infoimation to allow precise searching was not always included. In contrast, the manual system dealt with a l l reactions, and did not suffer from undiscovered faulty analyses, but i t was found again that the coding does not always include

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

222

COMPUTER-ASSISTED ORGANIC SYNTHESIS

sufficient information to characterise the reaction. Neither of the methods was thought ideal, consistently including a l l iinportant details about the reaction. The most significant omission from a purely reaction site analysis was the position and effect of remote groups. The second stage of the evaluation was to look at the needs of the user population. Discussions with chemists led us to believe that chemists would prefer a chemical classification and a mechanistic approach.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

We therefore looked at a system based on reactant and product information coupled with a generalised reaction classification. A SYSTEM BASED ON REACTION CLASSIFICATION METHODOLOGY. Our second approach to reaction indexing was as down-to-earth and practical as the f i r s t was academic. We s t i l l stored the compound information on reactants and products, but reaction sites, broken bonds and so on were a l l discarded and a chemical/mechanistic basis was chosen for the reaction classific­ ation. The reactions studied were those which had been used by our Process Development Department for evaluation as manufactur­ ing processes. REACTION CLASSIFICATION. The reaction classification chosen was based on that used by the standard reference work, Organic Syntheses (11). The classes were simplified to suit the specific application. Twenty-one reaction classes were finally defined, many divided into further sub-classes. The object of the classification was to assign each stage of each reaction to only one category, unless i t was a complex reaction, in which case i t was assigned to as few categories as possible. Thus:

would be classed only as N-alkylation and not as a ring cleavage reaction, even though a ring i s cleaved. EVALUATING THE APPROACH. This simple approach worked reason­ ably well in the area to which i t was applied - a small number of reactions (about 900) for which there was no available standard

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10. Ε A K I N A N D W A R R

Computerized

Aids to Organic

Synthesis

223

reference work. The system was slightly better than an equival­ ent published book since: (a) The CROSSBOW facilities could be used to interrogate the reactant and product information for classes of compound. (b) Desk-top tools could be generated to suit individual user needs, such as a molecular formula index of products, solvent indexes, etc.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

We did, however, feel the computer could do more to help the chemist and set about looking for a suitable method of supplementing the reaction classification approach, bearing in mind the p i t f a l l s of the f i r s t system. SEMI-AUTOMATIC REACTION ANALYSIS METHODOLOGY. We went back to think on how chemists them­ selves evaluate a possible synthetic route. We came to the conclusion that we needed a way to mark the reaction site information on the complete structure such that any part of the molecule can be accessed when necessary. We started off with the basic assumption that we would code the reactant and product information in WLN, hence whatever system we devised we could make use of the substructure search f a c i l i t i e s . We wanted to si^erimpose the reaction details on the reactant and product information in much the same way as a chemist would mark the reaction details on the structure diagr­ ams indicating the reaction pathway. For example: 0 0 H0^C-CF »NH -C-CF 3

2

3

We needed some way to indicate the reaction site and have devised a system based on the CROSSBOW connection table, which allows us to give a unique number for each node in the structure. For example, the above example gives us the following numbering: 35 0F « ι 6

HO-C-CTF

1 2

F.

i t ->

II

I

7

NL-C-C-F7

1

1 4

2p 2

s

and by quoting node numbers 1 and 2 i t i s possible to indicate simply the bonds fonned or broken and equally any rings formed or broken.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

224

COMPUTER-ASSISTED

ORGANIC SYNTHESIS

The above reaction could therefore be characterised as: -1,2 > +1,2

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

Adaptions to conventional atom-by-atom search programs enable us to algorithmically derive the reaction centre, and we have the f l e x i b i l i t y to make the reaction site as small or large as required by the question. Hence, i n the above example, the influence of the -CF^ group can be included or excluded. Similarly, on presenting the reactions as output from a search, we can represent them as the chemists would themselves. A simple modification to our existing structure display system enables us to indicate bonds formed and broken. EVALUATING THE SYSTEM. We are at present evaluating a system which combines the reaction classification approach with this semi-automatic site analysis. We are using the novel reactions as indicated in Index Chemicus and at the same time are evaluating the ICRS tapes as an automatic way of collecting the reaction and product information. ICRS particularly alerts novel reactions and syntheses, and the relevant abstracts can be found by searching the tapes or by scanning Index Chemicus. The f i r s t stage of the process i s to isolate a l l the WLNs given for the relevant abstract and to display the structures with relevant CROSSBOW connection table information. The total reaction i s examined in Index Chemicus and the reaction information added to the existing information and the appropriate class allocated. We have again used the Organic Synthesis (11) classification, but this time have l e f t i t essentially unmodified. Any relevant compounds not included in the ICRS tapes (because they did not represent novel compounds) are added to the reaction data base. At present we are building up an index of novel reactions reported i n 1975 and are pleased with the level of effort required to code any one reaction. The necessary modifications to the atom-by-atom search program and to the structure display program have been carried out. In the case of atom-by-atom search we have included the f a c i l i t y to indicate the bonds in the substructure which have been formed/broken. This means that a l l types of substructure can be searched for and any part of that sitostructure may be the reaction site. Both reactant and product molecules can be searched so that the following questions can be answered: (a) How can X be made?

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

10.

ΕAKIN A N D WARR

Computerized

Aids

to

Organic

Synthesis

225

(b) What happens when this bond i s broken i n Y? (c) What reactions have been used to convert group X into group Ζ i n the presence of Y? Modifications have been made to the structure display program. Here the bonds broken or formed are marked on the structure of the total molecule. Bonds formed are marked with a circle and bonds broken with a slash mark, so imitating a chemists normal presentation.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

CONCLUSIONS Research into the methods to be used i n reaction indexing and their adaptation to meet user needs has led to the develop­ ment of a potentially useful reaction indexing system. It uses techniques commonly available i n compound-oriented chemical information systems, with slight modifications, and yet presents data i n a form readily understandable by the chemists themselves. The resulting system should enable us to provide desk-top indexes, more up-to-date than the standard reference works and to combine these with more sophisticated computer methods. Hopefully, i n this way we w i l l improve the ways i n which chemists can find chemical information relevant to chemical reaction pathways.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

226

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ch010

LITERATURE CITED 1.

Hyde, E., Matthews, F.W., Thomson, L.H. and Wiswesser, W.J. - "Conversion of Wiswesser Notation to a Connectivity Matrix for Organic Compounds". J.Chem.Doc., V.7(4), p.200-204 (1967).

2.

Thomson, L.H., Hyde, E. and Matthews, F.W. - "Organic Search and Display Using a Connectivity Matrix Derived from the Wiswesser Notation". J.Chem.Doc., V.7(4), p.204-207 (1967).

3.

Hyde, E. and Thomson, L.H. - "Structure Display". J.Chem.Doc., V.8, p.138-146, 1968.

4.

Eakin, D.R. - "The ICI CROSSBOW System" in Ash, J.E. and Hyde, E. - "Chemical Information Systems". Horwood, 1975.

5.

Eakin, D.R., Hyde, E. and Palmer, G. - "The Use of Computers with Chemical Structural Information: ICI CROSSBOW System". Pesticide S c i . p.319-326, 1973.

6.

Harrison, I.J. and Harrison, S. - "Compendium of Organic Synthetic Methods". Wiley, Interscience, 1971.

7.

Valls, J . and Schier, O. - "Chemical Reaction Indexing" i n Ash, J.E. and Hyde, E. - "Chemical Information Systems". Horwood, 1975.

8.

Eakin, D.R. and Hyde, E. - "Evaluation of On-Line Techniques in a Substructure Search System" in "Computer Representation and Manipulation of Chemical Information", ed. Wipke, W.T., Heller, S.R., Feldman, R.J. and Hyde, E. Wiley.

9.

Harrison, J.M. and Lynch, M.F. - "Computer Analysis o f Chemical Reactions for Storage and Retrieval". Journal of the Chemical Society, p.2082-2087, 1970.

10.

Seddon, J.M. - "An Evaluation of Chemical Reaction Analysis and Retrieval Systems". University o f Sheffield M.Sc. Thesis, 1973.

11.

Organic Synthesis. Wiley.

Collective Volumes, Annual Volumes.

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

A Absolute space 98 Acessibility, steric 109 Acrylic molecule 89 Act interpretation theory 150 Action rules, situation106 Activity, compile-time 161 Activity, information-gathering 165 Acyclic structures, complete grammar for large subset of 67 Adamantane 211 Adding to the database 27 Addition, functional group 19 Advanced search model 148 Ajmaline, non-indole portion 8 ALCHEM language 105,197 Alcohols, tertiary 71 Aldol condensation 18 reaction 104 transform, excerpts for the 108 Algorithm(s) graph-matching 202 Morgan 132 repeated renumbering 135 sum 131,132 tree 131,132 Alkanols 66,70 1-Alkylmagnesium bromides 66 Alphanumeric CRT terminal 100 Analysis antithetic 2 (LHASA), Logic and Heuristics Applied to Synthetic 1 means-ends 124 reaction 219 retrosynthetic 2 semi-automatic reaction 223 AND-nodes 155 Andrenosterone, Uclaf synthesis of .... 109 Antithetic analysis 2 Antithetic reaction 194 Appendage-based strategies 15,16 Applicability criteria 120,150,167 Applicable transforms 171 Application of reaction sequences to mechanistic problems 208 Application of reaction sequences to a structure elucidation problem 211 Applying a transform 90, 200

Approach, total description Artificial intelligence system for chemical synthesis planning by computer Assignment methods, optimal Asterane, synthesis of Atom(s) chemistry of afixedset of equivalence between global ordering of key of a molecule, computerized procedures for numbering origin sequencing of Atom-by-atom search program Atomic coordinates, canonicalization based on Atomic descriptor Attention foci Β

BADLIST BE-matrices manipulation of tree of i-th diagonal entry of a Binary search techniques Binary search tree 20 Block data statements, FORTRAN Block structuring Blocking reactions 64 Bond(s) covalent cyclic, strategic disconnections for polycyclic targets fusion multiple 6 Boolean set Branch appendages Branched structures BREDT superatom Bredt's rule Bridgeheads 1-Bromoalkanes 66 Bubble sort Building blocks of the search management model Butadiene

229 In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

230

COMPUTER-ASSISTED ORGANIC

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

η-Butane, derivation of η-Butane, representations of 2-Butanone-3-ol teri-Butoxide ion Byproducts

SYNTHESIS

65 Chemistry (Continued) of afixedset of atoms 35 70 geometric and group-theoretical 136,137,143 aspects of constitutional 45 197 mathematical model of 83 constitutional 47 packages of LHASA 17 C CHEMONICS program 92 141 Cahn-Ingold-Prelog procedure 130 Chirality files 107 Canonical indexing 48 CHM Canonical ordering 48, 58 CHMTRN—Chemical Data Base Langauge 22 Canonical value, initial 131,137,140 CICLOPS 47,52 Canonicalization based on atomic 35, 203 coordinates 135 Classes, chemical equivalence Classical syntheses 105 Canonicalization of the molecular structure representation 128 Classification of chemical reactions 44,220 Camp's pyramid 89 e-Caprolactam 94 Classification of R-matrices, generation and 54 Carbon site, functionality of a 86 Code, linear symbolic 106 Carbonium ion rearrangements 195 Coded reactions 182,184 ^-Carotene, syntheses of 122 Compile-time activity 161 Catalog of starting compounds 173 Command(s) Categories of pharmaceutical CHAIN and ATNAME 199 synthetic analysis 181 EDITSTRUC 199 Cathode ray tube 199 HRANGE 199 CCTBU 199 iteration 23 CHAIN and ATNAME commands .... 199 TRANSFORM 199 Chaining 70 Common reaction sequence level 104 Character of a site 86 Complete grammar for large subset Chemical Abstracts 218 of acyclic structures 67 Chemical(s) Compounds, catalog of starting 173 constitution 36 Compression method 50 data base in LHASA 17 Computer(s) detecting environmentally active .... 54 -assisted structure elucidation 188 distance 46, 54 -assisted synthesis 191 -assisted synthetic analysis in engineering view of reaction path drug research 179 synthesis 81 and chemistry 33 equivalence classes 35 program(s) 149 fitting 39 for deductive solution of chemical flow chart 20 problems 33 grammar 64, 66 modular system of 47 history 189 Computerized aids to organic synthe­ problems, computer programs for sis in pharmaceutical research .... 217 deductive solution of 33 problems, solution of 34, 45 Computerized chemical reaction collection 182 reaction(s) 38 classification of 44 Computerized procedures for num­ bering atoms of a molecule 131 documentation systems for 57 49 predicting the products of 53 Concatenation 18 proposed computerized 182 Condensation, aldol 72 synthetically important organic .. 41 Condensation reactions 106 synthesis planning by computer ... 148 Conditions reaction 85 synthesis: SECS—simulation and evaluation of 97 CONGEN, inability to use stereo­ chemical information 215 transform(s) 104,189 188 translator 22 CONGEN program CONGEN reaction tree, expansion Chemistry of the 193 computers and 33

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

231

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

INDEX

Congeners to lanosterol, cyclization of squalene epoxide and 195 Connection table, CROSSBOW 223 Connection tables 136 Constraint(s) GOODLIST 206 product 200 reaction site 196,199 structural 206 Constructive graph labeling 203 Contextdependent grammar 11 free language 67 free productions 64 sensitive languages, workspace theorem of 72 sensitive productions 63 Control by electronic effects 112 Controlling tree growth through strategies and plans for transform selection 120 Corey's longifolene synthesis 110,121 Coriolin 211 Cost function, simple 82 Covalent bond 36 CROSSBOW 219 connection table 223 CRT terminal, alphanumeric 100 Cycle finding 191 Cycles, set of 8 Cyclic strategic bonds 10 Cyclization of squalene epoxide and congeners to lanosterol 195 Cyclobutane, numbering atoms of 141 1,1-Cycloheptanediol 205 Cyclohexane, numbering atoms of .... 141 Cyclohexanone 99 Cyclohexene 203 Cyclohexen-3-ol 203 Cyclohexen-4-ol 203

D Database adding to the 27 Language, CHMTRN—CHEMICAL 22 LHASA 4 organization of 25 qualifiers 29 requirements 22 Data lists, linked 90 table of recognition process 13 tables in LHASA 11 Deblocking reactions 64, 72 DEC GT40 graphics terminal 100 Deductive solution of chemical prob­ lems, computer programs for 33

Defining the synthetic problem 179 Definitions of reaction conditions 116 Dehydrochlorination reaction 197 Demasking of masked functional groups 72 Derwent Chemical Reactions Documentation Service 182,186 Derivation 63 of ethyl bromide 67 of n-butane 65 Description of a chemical transform .. 104 Description of SECS system 99 Description system ( MDS ), meta 151 Descriptor, atomic 48 Design, pilot program for synthetic ... 47 Designing a potential drug 217 Dethiacephalosporins 181 Development indexing and pruning of the reaction sequence tree 205 Dictionary, reaction 78 Diels-Alder addition 19 Diels-Alder transform 20, 21,119 Digitizing data tablet 5 1, l-Dimethyl-3-trichloromethylcyclohexane 129,130 Direct access to structures and substructures 49 Directory set 26 Disconnection, Robinson 28 Discrete dynamic programming 92 Distance, chemical 46, 54 Documentation systems for chemical reactions 57 Double coset problem 203 Dreiding model 103 Drug research, computer-assisted synthetic analysis in 179 Dummy neighbors 49 Duplication among products of a reaction 201 Duplication, symmetry 202 Ε EDITSTRUC 196,197,199 Elaboration of Reactions for Organic Synthesis (EROS) 47 Electron relocation by R-matrices, representation of 38 Electronic effects 110,112 Electrophilic localization energy Ill Elementary functional group 138,142 ELENERGY Ill ^-Elimination 44 Energy, electrophilic localization Ill Energy hypersurface 34 Ensemble formula, partitioned empirical 35 Ensembles of molecules, isomeric 35

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

232

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Functional group Enumeration of synthesis tree, addition 19 order of 92 -based strategies, opportunistic 3 Environmentally active chemicals, -based transforms 17 detecting 54 elementary 138,142 Equivalence interchange 18,106 between atoms 129 list of 2-butanone-3-ol 143 class 203 manipulations 184 classes, chemical 35 modifications 123 EROS ( Elaboration of Reactions for perception 11 Organic Synthesis ) 47, 52 to reaction conditions, sensitivity of 114 EROS, synthetic design program 52 reactivity 14 Ethyl bromide, derivation of 67 switching reactions 64, 72, 75 Ethyl crotonate 61 86 "Eureka syndrome" 97 Functionality of a carbon site 9 EVAL module 119 Fusion bond Evaluation of chemical synthesis : SECS— G simulation and 97 function, heuristic 153 Geminal dimethyl substitution 28 of industrial reactions 82 General problem solver system 169 node 153 Generation and classification EVLTRN (Evaluate Transform) 23 of R-matrices 54 Exhaustive enumeration 55 Generators, reaction 53 Expansion of the CONGEN Geometric and group-theoretical reaction tree 193 aspects of constitutional Explicit goal representation 122 chemistry 45 External addressing structure 26 Global ordering of atoms 130 part structures 62 F symmetry operations 129 18,120 Facilities, substructure search 218 Goal(s) -directed selection of transforms ... 120 Failednodes 161 list, logically structured 121 Family of isomeric ensembles of representation, explicit 122 molecules 35 situation 152 File(s) 206 CHM 107 GOODLIST constraint 62,64-67 RESTART 100 Grammar(s) chemical 66 structure, reaction 57 types of 64 unordered sequential source 107 context-dependent 11 Finite state machine 66 simple synthetic 70 First sphere 57 structural 65 Fischer indole synthesis 220 Graph Fixed set of atoms, chemistry of a 35 labeling, constructive 203 FLEXI, aflexibleadvanced search matching 191 model 170 algorithm 202 Flexible advanced search model, problem solving 175 FLEXI, a 170 as a minimal search model 153 Flow chart, chemical 20 small reaction 76 FNODE 171 Graphics 5 Formal languages, organic chemist's terminal, GT42 186 view of 60 Grignard addition 64 Formula, ensemble 35 Group FORTRAN 4,22,100 addition, functional 19 Forward-branching search strategy .... 92 interchange, functional 18,106 Fragmentations, mass spectral 195 origin 14 Free valences 212 perception, functional 11 Function, heuristic evaluation 153 reactivity, functional 14

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

INDEX

233

Group (Continued) -theoretical aspects of constitu­ tional chemistry, geometric and

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

H

Κ

45

Key atoms Keywords, CHMTRN

203 22

L

HAREHM module I l l Language(s) 62,64,76 Hendrickson's triangle 89 ALCHEM 105,197 Heuristic(s) CHMTRN—chemical data base .... 22 Applied to Synthetic Analysis context-free 67 (LHASA), Logic and 1 EDITSTRUC .? 197 evaluation function 153 organic chemists' view of formal .... 60 search method 152,173,175 workspace theorem of context search, theory of 155 sensitive 72 n-Hexane 61 Lanosterol, cyclization of squalene Hierarchically organized substructure epoxide and congeners to 195 files 51 Lead analog program 181 HONEYWELL-BULL system 100 LHASA (Logic and Heuristics HRANGE command 199 Applied to Synthetic Hyperstructures 144 Analysis) 1,4,17,47 Hypersurface, energy 34 data tables in 11 Hypersurface of syntheses 97 global view of 4, 5 interpreter 23 perception routines 7 I Line notation, Wiswesser 131 60 IBM-360/67 computer 94 Linear assays, one-dimensional 106 IBM 370 computer 100,136,186 linear symbolic code 90 Impurity reactions 83 Linked data lists 200 Indexing, canonical 48 Linknodes LISP organic synthesis program 70 Indexing and pruning of the reaction sequence tree, development 205 Local symmetry operations 129 Industrial reactions 82,83,92 Localization energy, electrophilic Ill Informationflow,network of 161 Logic and Heuristics Applied to Syn­ Information gathering 155,165 thetic Analysis (LHASA) .. 1, 4,17,47 Information retrieval application 182 Logical connectives 123 Initial canonical value 131,137 Logically structured goal list 121 Inner loop 152 Longifolene synthesis, Corey's 110,121 Input, structural 6 Interchange, functional group 18 Interconversion map 211 M Interconversions 209 Machine,finitestate 66 Interfering functionality and protect­ 71 ing groups, recognition of 112 Machines, Turing Inter-machine manipulation 50 Manipulation(s) of BE-matrices 48 Interpreter, LHASA 23 functional group 184 Intra-machine manipulation 50 inter-machine 50 Iodo-lactonization 19, 21 intra-machine 50 Irreducible R-matrix 40, 58 Isomeric ensembles of molecules 35 Map, interconversion 211 Isomerism 35 Mapping 46 Isonootkatone, Marshall's synthesis of 28 Marshall's synthesis of isonootkatone 28 Iteration 179 Masked functional groups, demasking command 23 of 72 Mass spectral fragmentations 195 MATCHEM 47 J Mathematical model of constitutional chemistry 47 JOIN command 199

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

234

Mathematics, solution of chemical problems on the basis of theoretical physics and 34 Matrices, BE36 Matrix reaction 38 MATSYN 47 Means-ends analysis 124 Mechanistic problems, application of reaction sequences to 208 Mechanistic studies 189 Memory, rote 91 Merck chemical structure information system 184 Mesa phenomenon 170 Meta description system (MDS) ... 151,157 Method, heuristic search 175 Methods, optimal assignment 56 6«-Methyl-17a-acetoxyprogesterone .. 136 Michael addition 25 Minimal chemical distance, determination of 54 Minimal search model, problem-solving graph 153 Model advanced search 148 builder module 103 building blocks of the search management 171 FLEXI, a flexible advanced search . 170 molecular orbital (MO) 110 problem-solving graph as a minimal search 153 search 175 management 148 Modelling, chemical reaction sequences used in molecular 188 Modelling, search 148 Module(s) EVAL 119 HAREM Ill model builder 103 of the SECS system 102 strategy 123 SYMIN 109 Modular system of computer programs 47 Molecular modelling, chemical reaction sequences used in 188 orbital (MO) model 110 structure representation, canonicalization of the 128 structure, target 152 symmetry, redundancy from 119 systems 34 Molecule(s) computerized procedures for numbering atoms of a 131

Molecule(s) (Continued) editor options listed by HELP command 102 isomeric ensembles of 35 multiple target 83 and reactions, representation of 86 small target 83 as strings of symbols 60 target 2,72,98 Morgan algorithm 132,134 Multiple bonds 6, 61 Multiple target molecules 83 Ν

Name reaction level Network of information flow Network-oriented strategies Networks, reaction New problems NLENERGY Node evaluation Node, nonstructural Non-radical reactions, R-matrices of some typical Nonstructural node Notation, n-tuplet structural Novel total synthesis Null strategy Numbering the atoms of a molecule, computerized procedures for Nylon monomer intermediates

104 161 121 83 82 Ill 153 61 42 61 78 105 120 131 94

Ο

OCSS-LHASA One-dimensional linear assays One-group transforms Opennodes Operating with R-matrices Operators, substitution Opportunistic functional group-based strategies Opportunistic strategy Optimal assignment methods Order of enumeration of synthesis tree Ordering, canonical Organic chemical reactions, synthetically important Organic chemist's view of formal languages Organic synthesis EROS in a pharmaceutical research com­ pany, computerized aides to .... program, LISP programs, rapid generation of reactants in

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

99 60 17 161 52 120 3 120 56 92 58 41 60 1 47 217 70 128

235

INDEX

Organization of the database Origin atom Origin, group OR-nodes

25 11 14 155

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

Ρ

Packages, ring transform 28 Parsing 67 Partitioned empirical ensemble formula 35 Path finding 191 Path, reaction 57 PDP-10, 11, 20 51,100 Penicillanic acid 181 Perception 7 functional group 11 routines, LHASA 7 second-order 103 Petrochemical syntheses 92 Pharmaceutical research company, computerized aids to organic synthesis in a 217 Pharmaceutical synthetic analysis, categories of 181 Phenomenon, Mesa 170 Plan, skeletal 170 Planning space 148,167 and state space, combining search in 170 Polyfunctional polycyclic synthesis .... 99 Polyname 205 Potential drug, designing a 217 Power, predictive 86 Precursors 98 Predicting chemical reaction products 53 Predictive power 86 Predictor systems 53 Problem(s) application of reaction sequences to a structure elucidation 211 double coset 203 functional group switching 75 separation 81 solving graph 175 as a minimal search model 153 with products 81 Product(s) constraints 200 predicting chemical reaction 53 problems with 81 of a reaction, duplication among .... 201 Production(s) context-free 64 context-sensitive 63 regular 64 rule form 165

Program ( s ) atom-by-atom search CHEMONICS descriptions lead analog LISP organic synthesis maintenance noninteractive synthesis rapid generation of reactants in organic synthesis REACT structure display synthesis Programming, discrete dynamic Programming, steps of the sum algorithm for Proposed computerized chemical reaction collection Prostaglandin synthesis Protecting groups, recognition of interfering, functionality and Pruning of the reaction sequence tree, development indexing and .. Pruning, tree reduction through Pseudo rings PSG model, rule set for the PSGRAPH Pyramid, Camp's

224 92 182 181 70 96 130 128 92 224 140 92 145 182 31 112 205 119 9 166 157 89

Q Qualifiers, data base

29

R

R-category 40, 41,57 R-transformation 39 Random search 179 Rapid generation of reactants in organic synthesis programs 128 REACT program evaluation 93 input/output 94 molecule representation 93 program features 94 reaction representation 93 search direction 93 search strategies 93 Reactants in organic synthesis pro­ grams, rapid generation of 128 Reacting molecule, symmetry of the .. 203 Reaction(s) Aldol 104 analysis 219 antithetic 194 blocking 64,72 carrying out the 200 center, details of the 221

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

236

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

Reaction ( s ) ( Continued )

chemical 38 classification system 220, 222 coding sheet 182 conditions 85 definitions of 116 sensitivities of functional groups to 114 core 57 deblocking 64,72 definition 196 dehydrochlorination 197 dictionary 78 discovering new 105 duplication among products of a .... 201 evaluation of industrial 82 file structure 57 functional group switching 64, 72 generators 53 graph, small 76 impurity 83 indexing system 225 level, name 104 matrix 38 networks 83 for Organic Synthesis), EROS (Elaboration of 47 path(s) 57 applications to industrial 92 synthesis 83 chemical engineering view of 81 representation of molecules and 86 retriever 179,184 sequence(s) level, common 104 to mechanistic problems, application of 208 to a structure elucidation problem, application of 211 to structure elucidation, relationship of 194 tree 205 site constraints 196,199 transform 199 types 44 Reactivity, functional group 14 Real rings 9 Recognition of interfering functionality and protecting groups 112 Recognition questions, sample 13 Reduction through greater chemical accuracy, synthesis tree 108 Reduction through symmetry, tree .... 118 Redundancy from molecular symmetry 119 Reference 106 Regular language 76 Regular productions 64

Relationship of reaction sequences to structure elucidation 194 Relevance criteria 150 Relevant transforms 167,171 REPAS case study using Hendrickson's representation 89 Repeated renumbering algorithm 135 Representation ( s ) of n-butane 70 of electron relocation by R-matrices 38 level, ab initio

104

of molecules and reactions 86 substructure 90 RESTART file 100 Retriever vs. reactions for a synthesizer, reactions for a 184 Retrosynthetic analysis 2 Retrosynthetic tree 102 Ring appendages 16 Ring-forming transforms 19 Ring perception 8, 9 Ring transform packages 28 RLENERGY Ill R-matrices 40,57,58 generation and classification of 54 operating with 52 representation of electron relocation by 38 of some typical non-radical reactions 42 RNODE 171 Robinson annelation 19 transform 24 Robinson disconnection 28 Root node 153 Rote memory 91 Routes, synthetic 1 Routines, LHASA, perception 7 Rule form, production 165 Rule set 148 Rules, search guidance 165 S

Sample recognition questions Search guidance rules management model method, heuristic model FLEXI modelling in a planning space random space, structure of the strategies Second-order perception

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

13 148 165 148,171 152 152,175 170 148 167 179 167 91,92 103

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

INDEX

237

SECS—simulation and evaluation of chemical synthesis 97, 99,186 commands 100 modules of the 102 Selection by applicability 120 Semi-automatic reaction analysis 223 Sense definitions 161 Sensitivities of functional groups to reaction conditions 114 Sentential form 65 Separation problems 81 Sequencing of atoms 49 Serial numbers of elementary functional groups 138 Set, Boolean 26 Set of cycles 8 SHOW command 200 Similarities to and contrasts with computer-assisted synthesis 191 Simmons-Smith reaction 19 Simple cost function 82 Simulation and evaluation of chemical synthesis ( SECS ) program .... 186 Site, functionality of carbon 86 Site, reaction 196,199 Situation-action rules 106 Skeletal plan 170 Small reaction graph 76 Small target molecules 83 SNODE 171 Software subroutines 24 Solution of chemical problems 34, 45 Sort, bubble 146 Sourcefiles,unordered sequential 107 Space, absolute 98 Space, planning 167 Space, state 150,167 Squalene epoxide and congeners to lanosterol, cyclization of 195 Start symbol 63 Starting compounds, catalog of 173 State space 150,167,170 Steps of the sum algorithm for programming 145 Stereochemical canonical value 140 graph isomorphism groups 118 information, CONGEN, inability to use 215 relation tables 136 Stereochemistry 6 Steric accessibility 109 Steric effects 108 Stoichiometry 83 Straight-chained structures 61 Strategic bonds, cyclic 10 Strategy (ies) 3 appendage-based 15,16

Strategy (ies)

(Continued)

based on structural features 3 forward-branching search 92 module 123 network-oriented 121 null 120 opportunistic 120 and plans for transform selection, controlling tree growth through 120 search 91 symmetry-based 122 Strings of symbols, molecules as 60 Structural constraints 206 descriptions of problem solving graph model 157 grammar 65 input 6 modifications 123 notation, n-tuplet 78 Structure(s) branched 61 complete grammar for large subset of acyclic 67 display program 224 elucidation, computer-assisted 188 elucidation problem, application of reaction sequences to a 211 elucidation, relationship of reaction sequences to 194 external addressing 26 global part 62 problems 188 reaction file 57 of the search space 167 straight-chained 61 and substructures, direct access to .. 49 target molecular 152 Structuring, block 23 Studies, mechanistic 189 Subgoal 18 generation 124 transforms 30 Subroutines, software 24 Substitution, geminal dimethyl 28 Substitution operators 120 Substructure(s) 106 direct access to structures and 49 files, hierarchically organized 51 numbering of synthetically interesting 141 representation 90 search facilities 218 Sum algorithm ( s ) 131,132,136 for programming, steps of the 145 Summary reaction classification 220 Superatom, BREDT 200

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

238

COMPUTER-ASSISTED ORGANIC SYNTHESIS

Superatoms 188 Symbol(s) 63 molecules as strings of 60 start 63 terminal 64 Symbolic representation 175 Symbolization 175 SYMIN module 109 Symmetry -based strategies 122 duplication 202 operations 129 of the reacting molecule 203 redundancy from molecular 119 transform 119 tree reduction through 118 SYNCHEM 47,149 SYNCOM 107 Syndrome, Eureka 97 Syntheses of β-çarotene 122 classical 105 hypersurface of 97 petrochemical 92 Synthesis of andrenosterone, Uclaf 109 of asterane 118 chemical engineering view of reaction path 81 Corey's longifolene 110,121 Fischer indole 220 industrial reaction path 83 novel total 105 organic 1 planning by computer, artificial intelligence system for chemical 148 polyfunctional polycyclic 99 program(s) completeness and accuracy of a .. 99 noninteractive 130 prostagladin 31 reverse salami 73 SECS—simulation and evaluation of chemical 97 similarities to and contrasts with computer-assisted 191 tree 2,3,92,193 reduction through greater chemical accuracy 108 types of pharmaceutical 181 Synthesizer 179 reactions for a retriever vs. reactions for a 184 Synthetic analysis in drug research, computer-assisted 179,181

Synthetic

(Continued)

Analysis (LHASA), Logic and Heuristics Applied to design, pilot program for design program EROS grammar, simple pathways, tree of problem, defining routes generated by Diels-Alder transform Synthetically important organic chemical reactions Synthetically interesting substructures, numbering System for chemical synthesis planning by computer, artificial intelligence description of SECS general problem solver less labor-intensive modules of the SECS molecular predictor reaction classification reaction indexing

1 47 52 70 47 179 1 20,21 41 141

148 99 169 221 102 34 53 222 225

Τ Table translator 22 Tables, connection 136 Target 23 molecular structure 152 molecule 2,72, 83, 98,191 TBLTRN 22 Techniques, binary search 28 TELENET 100 Teletype terminal 100 Terminal alphanumeric CRT 100 symbols 64 vocabulary 63 Tertiary alcohols 71 Tetrahydrodicyclopentadiene 211 Text searching facilities 217 Theilheimer's series of volumes 182 Theoretical physics and mathematics, solution of chemical problems on the basis of 34 Theory of heuristic search 155 Transform(s) 2,14,105,196 by applicability, selection of 167,171 applying 90,200 chemical 189 description of a chemical 104 Diels-Alder 119 excerpts from the Aldol 108

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

INDEX

239

Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0061.ix001

Transform ( s ) ( Continued )

functional group-based 17 goal-directed selection 120 level and completeness 105 one-group 17 reaction 199 relevant 167,171 ring-forming 19 ring-oriented 19 Robinson annelation 24 selection 123 selection, controlling tree growth through strategies and plans for 120 subgoal 30 symmetry 119 two-group 17 TRANSFORM command 199 Tree algorithms 131 approach, illustration of 132 development indexing and pruning of the reaction sequence 205 expansion of the CONGEN reaction 193 expansion of the synthesis 193 growth through strategies and plans for transform selection, controlling 120 reaction sequence 205 reduction through pruning 119

Triangle, Hendrickson's 2,2,4-Trimethylhexane n-Tuplet structural notation Turing machines Two-group transforms TYMNET

89 61 78 71 17 100

U

Uclaf synthesis of andrenosterone UNIVAC 1108 system UNJOIN command Unordered sequential source files

109 100 199 107

V

Valences, free Vocabulary, terminal

212 63

W

Wiswesser line notation 131 WLN (Wiswesser Line Notation) .... 220 Workspace theorem of context sensitive languages 72 Y

Yield

In Computer-Assisted Organic Synthesis; Wipke, W., el al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

83