Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand [1 ed.] 149206257X, 9781492062578

NLP has exploded in popularity over the last few years. But while Google, Facebook, OpenAI, and others continue to relea

1,191 221 10MB

English Pages 336 Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand [1 ed.]
149206257X, 9781492062578

Author / Uploaded
Ankur A. Patel
Ajay Uppili Arasanipalai

Table of contents :
Copyright
Table of Contents
Preface
What Is Natural Language Processing?
Why Should I Read This Book?
What Do I Need to Know Already?
What Is This Book All About?
How Is This Book Organized?
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Ajay
Ankur
Part I. Scratching the Surface
Chapter 1. Introduction to NLP
What Is NLP?
Popular Applications
History
Inflection Points
A Final Word
Basic NLP
Defining NLP Tasks
Set Up the Programming Environment
spaCy, fast.ai, and Hugging Face
Perform NLP Tasks Using spaCy
Conclusion
Chapter 2. Transformers and Transfer Learning
Training with fastai
Using the fastai Library
ULMFiT for Transfer Learning
Fine-Tuning a Language Model on IMDb
Training a Text Classifier
Inference with Hugging Face
Loading Models
Generating Predictions
Conclusion
Chapter 3. NLP Tasks and Applications
Pretrained Language Models
Transfer Learning and Fine-Tuning
NLP Tasks
Natural Language Dataset
Explore the AG Dataset
NLP Task #1: Named Entity Recognition
Perform Inference Using the Original spaCy Model
Custom NER
Annotate via Prodigy: NER
Train the Custom NER Model Using spaCy
Custom NER Model Versus Original NER Model
NLP Task #2: Text Classification
Annotate via Prodigy: Text Classification
Train Text Classification Models Using spaCy
Conclusion
Part II. The Cogs in the Machine
Chapter 4. Tokenization
A Minimal Tokenizer
Hugging Face Tokenizers
Subword Tokenization
Building Your Own Tokenizer
Conclusion
Chapter 5. Embeddings: How Machines “Understand” Words
Understanding Versus Reading Text
Word Vectors
Word2Vec
Embeddings in the Age of Transfer Learning
Embeddings in Practice
Preprocessing
Model
Training
Validation
Embedding Things That Aren’t Words
Making Vectorized Music
Some General Tips for Making Custom Embeddings
Conclusion
Chapter 6. Recurrent Neural Networks and Other Sequence Models
Recurrent Neural Networks
RNNs in PyTorch from Scratch
Bidirectional RNN
Sequence to Sequence Using RNNs
Long Short-Term Memory
Gated Recurrent Units
Conclusion
Chapter 7. Transformers
Building a Transformer from Scratch
Attention Mechanisms
Dot Product Attention
Scaled Dot Product Attention
Multi-Head Self-Attention
Adaptive Attention Span
Persistent Memory/All-Attention
Product-Key Memory
Transformers for Computer Vision
Conclusion
Chapter 8. BERTology: Putting It All Together
ImageNet
The Power of Pretrained Models
The Path to NLP’s ImageNet Moment
Pretrained Word Embeddings
The Limitations of One-Hot Encoding
Word2Vec
GloVe
fastText
Context-Aware Pretrained Word Embeddings
Sequential Models
Sequential Data and the Importance of Sequential Models
RNNs
Vanilla RNNs
LSTM Networks
GRUs
Attention Mechanisms
Transformers
Transformer-XL
NLP’s ImageNet Moment
Universal Language Model Fine-Tuning
ELMo
BERT
BERTology
GPT-1, GPT-2, GPT-3
Conclusion
Part III. Outside the Wall
Chapter 9. Tools of the Trade
Deep Learning Frameworks
PyTorch
TensorFlow
Jax
Julia
Visualization and Experiment Tracking
TensorBoard
Weights & Biases
Neptune
Comet
MLflow
AutoML
H2O.ai
Dataiku
DataRobot
ML Infrastructure and Compute
Paperspace
FloydHub
Google Colab
Kaggle Kernels
Lambda GPU Cloud
Edge/On-Device Inference
ONNX
Core ML
Edge Accelerators
Cloud Inference and Machine Learning as a Service
AWS
Microsoft Azure
Google Cloud Platform
Continuous Integration and Delivery
Conclusion
Chapter 10. Visualization
Our First Streamlit App
Build the Streamlit App
Deploy the Streamlit App
Explore the Streamlit Web App
Build and Deploy a Streamlit App for Custom NER
Build and Deploy a Streamlit App for Text Classification on AG News Dataset
Build and Deploy a Streamlit App for Text Classification on Custom Text
Conclusion
Chapter 11. Productionization
Data Scientists, Engineers, and Analysts
Prototyping, Deployment, and Maintenance
Notebooks and Scripts
Databricks: Your Unified Data Analytics Platform
Support for Big Data
Support for Multiple Programming Languages
Support for ML Frameworks
Support for Model Repository, Access Control, Data Lineage, and Versioning
Databricks Setup
Set Up Access to S3 Bucket
Set Up Libraries
Create Cluster
Create Notebook
Enable Init Script and Restart Cluster
Run Speed Test: Inference on NER Using spaCy
Machine Learning Jobs
Production Pipeline Notebook
Scheduled Machine Learning Jobs
Event-Driven Machine Learning Pipeline
MLflow
Log and Register Model
MLflow Model Serving
Alternatives to Databricks
Amazon SageMaker
Saturn Cloud
Conclusion
Chapter 12. Conclusion
Ten Final Lessons
Lesson 1: Start with Simple Approaches First
Lesson 2: Leverage the Community
Lesson 3: Do Not Create from Scratch, When Possible
Lesson 4: Intuition and Experience Trounces Theory
Lesson 5: Fight Decision Fatigue
Lesson 6: Data Is King
Lesson 7: Lean on Humans
Lesson 8: Pair Yourself with Really Great Engineers
Lesson 9: Ensemble
Lesson 10: Have Fun
Final Word
Appendix A. Scaling
Multi-GPU Training
Distributed Training
What Makes Deep Training Fast?
Appendix B. CUDA
Threads and Thread Blocks
Writing CUDA Kernels
CUDA in Practice
Index
About the Authors
Colophon

Polecaj historie

Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand [1 ed.] 149206257X, 9781492062578

NLP has exploded in popularity over the last few years. But while Google, Facebook, OpenAI, and others continue to relea

3,961 702 10MB Read more

Applied Natural Language Processing in the Enterprise 9781492062578

938 114 6MB Read more

the APPLIED AI AND NATURAL LANGUAGE PROCESSING WORKSHOP - : learn how to use powerful natural... language processing techniques within your own art. 9781800208742, 180020874X

1,032 225 35MB Read more

Natural Language Processing (NLP)

932 174 3MB Read more

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language 9781484237328, 9781484237335, 1484237323

Learn to harness the power of AI for natural language processing, performing tasks such as spell check, text summarizati

4,074 791 3MB Read more

Learning to Read and Write in the Multilingual Family 9781847693716

This book walks parents through the multilingual reading and writing process from infancy to adolescence. It identifies

212 105 851KB Read more

Natural Language Processing in Action 9781617294631, 1617294632

Summary Natural Language Processing in Action is your guide to creating machines that understand human language using t

10,552 2,654 7MB Read more

Natural Language Processing (Jacob Eisenstein)

3,540 308 6MB Read more

Natural Language Processing 9781108420211, 9781108332873

1,449 208 16MB Read more

Python Natural Language Processing Cookbook: Over 50 recipes to understand, analyze, and generate text for implementing language processing tasks 1838987312, 9781838987312

Get to grips with solving real-world NLP problems, such as dependency parsing, information extraction, topic modeling, a

1,789 370 2MB Read more