Build Your Own Programming Language [Second Edition] 9781804618028

Embark on a journey through essential components of language design, compiler construction, preprocessors, transpilers,

145 105

English Pages 649 Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Build Your Own Programming Language [Second Edition]
 9781804618028

Table of contents :
Preface
Who this book is for
What this book covers
Get in touch
Section I: Programming Language Frontends
Why Build Another Programming Language?
Motivations for writing your own programming language
Types of programming language implementations
Organizing a bytecode language implementation
Languages used in the examples
The difference between programming languages and libraries
Applicability to other software engineering tasks
Establishing the requirements for your language
Case study – requirements that inspired the Unicon language
Unicon requirement #1 – preserve what people love about Icon
Unicon requirement #2 – support large-scale programs working on big data
Unicon requirement #3 – high-level input/output for modern applications
Unicon requirement #4 – provide universally implementable system interfaces
Summary
Questions
Programming Language Design
Determining the kinds of words and punctuation to provide in your language
Specifying the control flow
Deciding on what kinds of data to support
Atomic types
Composite types
Domain-specific types
Overall program structure
Completing the Jzero language definition
Case study – designing graphics facilities in Unicon
Language support for 2D graphics
Adding support for 3D graphics
Summary
Questions
Scanning Source Code
Technical requirements
Lexemes, lexical categories, and tokens
Regular expressions
Regular expression rules
Regular expression examples
Using UFlex and JFlex
Header section
Regular expressions section
Writing a simple source code scanner
Running your scanner
Tokens and lexical attributes
Expanding our example to construct tokens
Writing a scanner for Jzero
The Jzero flex specification
Unicon Jzero code
Java Jzero code
Running the Jzero scanner
Regular expressions are not always enough
Summary
Questions
Parsing
Technical requirements
Syntax analysis
Context-free grammars
Writing context-free grammar rules
Writing rules for programming constructs
Using iyacc and BYACC/J
Declaring symbols in the header section
Advanced yacc declarations
Putting together the yacc context-free grammar section
Understanding yacc parsers
Fixing conflicts in yacc parsers
Syntax error recovery
Putting together a toy example
Writing a parser for Jzero
The Jzero lex specification
The Jzero yacc specification
Unicon Jzero code
Java Jzero parser code
Running the Jzero parser
Improving syntax error messages
Adding detail to Unicon syntax error messages
Adding detail to Java syntax error messages
Using Merr to generate better syntax error messages
Summary
Questions
Syntax Trees
Technical requirements
Using GNU Make
Learning about trees
Defining a syntax tree type
Parse trees versus syntax trees
Creating leaves from terminal symbols
Wrapping tokens in leaves
Working with YACC’s value stack
Wrapping leaves for the parser’s value stack
Determining which leaves you need
Building internal nodes from production rules
Accessing tree nodes on the value stack
Using the tree node factory method
Forming syntax trees for the Jzero language
Debugging and testing your syntax tree
Avoiding common syntax tree bugs
Printing your tree in a text format
Printing your tree using dot
Summary
Questions
Section II: Syntax Tree Traversals
Symbol Tables
Technical requirements
Establishing the groundwork for symbol tables
Declarations and scopes
Assigning and dereferencing variables
Choosing the right tree traversal for the job
Creating and populating symbol tables for each scope
Adding semantic attributes to syntax trees
Defining classes for symbol tables and symbol table entries
Creating symbol tables
Populating symbol tables
Synthesizing the isConst attribute
Checking for undeclared variables
Identifying the bodies of methods
Spotting uses of variables within method bodies
Finding redeclared variables
Inserting symbols into the symbol table
Reporting semantic errors
Handling package and class scopes in Unicon
Mangling names
Inserting self for member variable references
Inserting self as the first parameter in method calls
Testing and debugging symbol tables
Summary
Questions
Checking Base Types
Technical requirements
Type representation in the compiler
Defining a base class for representing types
Subclassing the base class for complex types
Assigning type information to declared variables
Synthesizing types from reserved words
Inheriting types into a list of variables
Determining the type at each syntax tree node
Determining the type at the leaves
Calculating and checking the types at internal nodes
Runtime type checks and type inference in Unicon
Summary
Questions
Checking Types on Arrays, Method Calls, and Structure Accesses
Technical requirements
Checking operations on array types
Handling array variable declarations
Checking types during array creation
Checking types during array accesses
Checking method calls
Calculating the parameters and return type information
Checking the types at each method call site
Checking the type at return statements
Checking structured type accesses
Handling instance variable declarations
Checking types at instance creation
Checking types of instance accesses
Summary
Questions
Intermediate Code Generation
Technical requirements
What is intermediate code?
Why generate intermediate code?
Learning about the memory regions in the generated program
Introducing data types for intermediate code
Adding the intermediate code attributes to the tree
Generating labels and temporary variables
An intermediate code instruction set
Instructions
Declarations
Annotating syntax trees with labels for control flow
Generating code for expressions
Generating code for control flow
Generating label targets for condition expressions
Generating code for loops
Generating intermediate code for method calls
Reviewing the generated intermediate code
Summary
Questions
Syntax Coloring in an IDE
Writing your own IDE versus supporting an existing one
Downloading the software used in this chapter
Adding support for your language to Visual Studio Code
Configuring Visual Studio Code to do Syntax Highlighting for Jzero
Visual Studio Code extensions using the JSON format
JSON atomic types
JSON collections
File organization for Visual Studio Code extensions
The extensions file
The extension manifest
Writing IDE tokenization rules using TextMate grammars
Integrating a compiler into a programmer’s editor
Analyzing source code from within the IDE
Sending compiler output to the IDE
Avoiding reparsing the entire file on every change
Using lexical information to colorize tokens
Extending the EditableTextList component to support color
Coloring individual tokens as they are drawn
Highlighting errors using parse results
Summary
Questions
Section III: Code Generation and Runtime Systems
Preprocessors and Transpilers
Understanding preprocessors
A preprocessing example
Identity preprocessors and pretty printers
The preprocessor within the Unicon preprocessor
Code generation in the Unicon preprocessor
Transforming objects into classes
Generating source code from the syntax tree
Closure-based inheritance in Unicon
The difference between preprocessors and transpilers
Transpiling Jzero code to Unicon
Semantic attributes for transpiling to Unicon
A code generation model for Jzero
The Jzero to Unicon transpiler code generation method
Transpiling the base cases: names and literals
Handling the dot operator
Mapping Java expressions to Unicon
Transpiler code for method calls
Assignments
Transpiler code for control structures
Transpiling Jzero declarations
Transpiling Jzero block statements
Transpiling a Jzero class into a Unicon package that contains a class
Summary
Questions
Bytecode Interpreters
Technical requirements
Understanding what bytecode is
Comparing bytecode with intermediate code
Building a bytecode instruction set for Jzero
Defining the Jzero bytecode file format
Understanding the basics of stack machine operation
Implementing a bytecode interpreter
Loading bytecode into memory
Initializing the interpreter state
Fetching instructions and advancing the instruction pointer
Instruction decoding
Executing instructions
Starting up the Jzero interpreter
Writing a runtime system for Jzero
Running a Jzero program
Examining iconx, the Unicon bytecode interpreter
Understanding goal-directed bytecode
Leaving type information in at runtime
Fetching, decoding, and executing instructions
Crafting the rest of the runtime system
Summary
Questions
Generating Bytecode
Technical requirements
Converting intermediate code to Jzero bytecode
Adding a class for bytecode instructions
Mapping intermediate code addresses to bytecode addresses
Implementing the bytecode generator method
Generating bytecode for simple expressions
Generating code for pointer manipulation
Generating bytecode for branches and conditional branches
Generating code for method calls and returns
Handling labels and other pseudo-instructions in intermediate code
Comparing bytecode assembler with binary formats
Printing bytecode in assembler format
Printing bytecode in binary format
Linking, loading, and including the runtime system
Unicon example – bytecode generation in icont
Summary
Questions
Native Code Generation
Technical requirements
Deciding whether to generate native code
Introducing the x64 instruction set
Adding a class for x64 instructions
Mapping memory regions to x64 register-based address modes
Using registers
Starting from a null strategy
Assigning registers to speed up the local region
Converting intermediate code to x64 code
Mapping intermediate code addresses to x64 locations
Implementing the x64 code generator method
Generating x64 code for simple expressions
Generating code for pointer manipulation
Generating native code for branches and conditional branches
Generating code for method calls and returns
Handling labels and pseudo-instructions
Generating x64 output
Writing the x64 code in assembly language format
Going from native assembler to an object file
Linking, loading, and including the runtime system
Summary
Questions
Implementing Operators and Built-In Functions
Implementing operators
Comparing adding operators to adding new hardware
Implementing string concatenation in intermediate code
Adding String concatenation to the bytecode interpreter
Adding String concatenation to the native runtime system
Writing built-in functions
Adding built-in functions to the bytecode interpreter
Writing built-in functions for use with the native code implementation
Integrating built-ins with control structures
Developing operators and functions for Unicon
Writing operators in Unicon
Developing Unicon’s built-in functions
Summary
Questions
Domain Control Structures
Knowing when a new control structure is needed
Scanning strings in Icon and Unicon
Scanning environments and their primitive operations
Eliminating excessive parameters via a control structure
Rendering regions in Unicon
Rendering 3D graphics from a display list
Specifying rendering regions using built-in functions
Varying levels of detail using nested rendering regions
Creating a rendering region control structure
Adding a reserved word for rendering regions
Adding a grammar rule
Checking wsection for semantic errors
Generating code for a wsection control structure
Summary
Questions
Garbage Collection
Grasping the importance of garbage collection
Counting references to objects
Adding reference counting to Jzero
Reducing the number of heap allocations for strings
Modifying the generated code for the assignment operator
Modifying the generated code for method call and return
The drawbacks and limitations of reference counting
Marking live data and sweeping the rest
Organizing heap memory regions
Traversing the basis to mark live data
Marking the block region
Reclaiming live memory and placing it into contiguous chunks
Summary
Questions
Final Thoughts
Reflecting on what was learned from writing this book
Deciding where to go from here
Studying programming language design
Learning about implementing interpreters and bytecode machines
Acquiring expertise in code optimization
Monitoring and debugging program executions
Designing and implementing IDEs and GUI builders
Exploring references for further reading
Studying programming language design
Learning about implementing interpreters and bytecode machines
Acquiring expertise in native code and code optimization
Monitoring and debugging program executions
Designing and implementing IDEs and GUI builders
Summary
Section IV: Appendix
Appendix: Unicon Essentials
Syntactic shorthand
Running Unicon
Using Unicon’s declarations and data types
Declaring program components
Using atomic data types
Numeric
Textual
Aggregating multiple values using structure types
Classes
Lists
Tables
Sets
Files
Other types
Evaluating expressions
Forming expressions using operators
Invoking procedures, functions, and methods
Iterating and selecting what and how to execute
Generators
Debugging and environmental issues
Learning the basics of the UDB debugger
Environment variables
Preprocessor
Preprocessor commands
Built-in macro definitions
Function mini-reference
Selected keywords
Answers
Other Books You May Enjoy
Index

Polecaj historie