Parallel and High Performance Programming with Python: Unlock Parallel and Concurrent Programming in Python using Multithreading, CUDA, Pytorch, and Dask 9789388590730

This book will teach you everything about the powerful techniques and applications of parallel computing, from the basic

141 108 7MB

English Pages 340 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Parallel and High Performance Programming with Python: Unlock Parallel and Concurrent Programming in Python using Multithreading, CUDA, Pytorch, and Dask
 9789388590730

Table of contents :
Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
Technical Reviewers
Acknowledgements
Preface
Errata
Table of Contents
1. Introduction to Parallel Programming
Structure
Parallel programming
Technological evolution of computers and parallelism
CPU, cores, threads, and processes
Concurrent and parallel programming
Threads and processes in Python for concurrent and parallel models
Python thread problem: the GIL
Elimination of GIL to achieve multithreading
Threads versus processes in Python
Concurrency and parallelism in Python
The light concurrency with greenlets
Parallel programming with Python
Synchronous and asynchronous programming
Map and reduce
CPU-bound and I/O-bound operations
Additional precautions in parallel programming
Threading and multiprocessing modules
Memory organization and communication
Memory organization within a process
Memory organization between multiple processors
Distributed programming
Evaluation of parallel programming
Speedup
Scaling
Benchmarking in Python
Profiling
Conclusion
Points to remember
Questions
References
2. Building Multithreaded Programs
Structure
Threads
join() method
Common thread synchronization pattern
The concurrent.futures module and the ThreadPoolExecutor
Thread competition
Using Thread subclasses
Synchronization mechanisms
Lock
Context management protocol with Lock
Another possible synchronization solution with Locks
RLock
Semaphore
Condition
Event
Queue
Conclusion
Points to remember
Questions
References
3. Working with Multiprocessing and mpi4py Library
Structure
Processes and the multiprocessing module
Using process IDs
Process pool
Defining processes as subclasses
Channels of communication between processes
Queues
Pipes
Pipe versus Queue
Mapping of a function through a process pool
Mapping in parallel with chunksize
The ProcessPoolExecutor
The mpi4py library
Parallelism of the processes
Efficiency of parallelism based on the number of processors/cores
Main applications of mpi4py
Point-to-point communication implementation
Collective communications
Collective communication using data broadcast
Collective communication using data scattering
Collective communication using data gathering
Collective communication using the AlltoAll mode
Reduction operation
Optimizing communications through topologies
Conclusion
References
4. Asynchronous Programming with AsyncIO
Structure
Asynchronous and synchronous programming
Pros and cons of asynchronous and synchronous programming
Concurrent programming and asynchronous model
AsyncIO library
async/await syntax
Coroutines
Task
Gathering of the awaitables for concurrent execution
Future
Event loop
Asynchronous iterations with and without async for
Queue in the asynchronous model
Alternatives to the AsyncIO library
Conclusion
References
5. Realizing Parallelism with Distributed Systems
Structure
Distributed systems and programming
Pros and cons of distributed systems
Celery
Architecture of Celery systems
Tasks
Setting up of a Celery system
Installing Anaconda
Installing Celery
Installing Docker
Installing a message transport (Broker)
Installing the result backend
Setting a Celery system
Defining tasks
Calling tasks
Example task
Signatures and primitives
Dramatiq library as an alternative to Celery
Installing Dramatiq
Getting started with Dramatiq
Management of results
SCOOP library
Installing SCOOP
Conclusion
References
6. Maximizing Performance with GPU Programming using CUDA
Structure
GPU architecture
GPU programming in Python
Numba
Numba for CUDA
Logical hierarchy of the GPU programming model
Installation of CUDA
Installing Numba for CUDA
Declaration and invocation of Kernel
Device functions
Programming example with Numba
Further changes
Extension to matrices (2D array)
Transfer of data through the queue
Sum between two matrices
Multiplication between matrices
PyOpenCL
Installing of pyOpenCL
PyOpenCL programming model
Developing a program with pyOpenCL
Multiplication between matrices: an example
Element-wise calculation with pyopenCL
MapReduce calculation with pyOpenCL
Conclusion
References
7. Embracing the Parallel Computing Revolution
Structure
High-performance computing (HPC)
Parallel computing
Benefits of parallel computing
Projects and examples of parallel computing
Meteorology
Oceanography
Seismology
Astrophysics
Oil and energy industry
Finance
Engineering
Medicine and drug discovery
Drug discovery
Genomics
Entertainment – games and movies
Game engines
Designing a parallel game engine
Movies and 3D animations
Conclusion
References
8. Scaling Your Data Science Applications with Dask
Structure
Data Science, Pandas library, and parallel computing
Dask library
Getting started on a single machine
Dask collections
Methods on collections
Computing and task graphs
Low-level interfacing with the Dask Delayed
Getting started on a cluster of machines
Kaggle community
Saturn.io cluster
Uploading data to the cluster
Begin programming the cluster
Conclusion
References
9. Exploring the Potential of AI with Parallel Computing
Structure
Artificial intelligence (AI)
AI, machine learning, and deep learning
Supervised and unsupervised learning
Artificial intelligence and parallel computing
Parallel and distributed machine learning
Machine learning with scikit-learn
Scaling scikit-learn with Dask-ML
Parallel and distributed deep learning
PyTorch and TensorFlow
Deep learning example with PyTorch
PyTorch installation
Example with the Fashion-MNIST dataset
Deep learning example with PyTorch and GPU
Scaling PyTorch with Dask
Conclusion
References
10. Hands-on Applications of Parallel Computing
Structure
Massively parallel artificial intelligence
Edge computing
Distributed computing infrastructure with cyber-physical systems
Artificial intelligence in cybersecurity
Advent of Web 5.0
Exascale computing
Quantum computing
New professional opportunities
Required advances in parallel computing for Python
Future of PyTorch and TensorFlow
Conclusion
References
Index

Polecaj historie