Developing Modern Applications with a Converged Database. Succeeding with Diverse Data Formats, Processing Techniques, and Workloads 9781098107352

586 147 4MB

English Pages [51] Year 2021

Polecaj historie

Developing Modern Database Applications with PostgreSQL [1 ed.] 9781838648145

Get up to speed with core PostgreSQL tasks such as database administration, application development, database performanc

3,146 510 35MB Read more

Developing Modern Database Applications with PostgreSQL [1 ed.] 9781838641061, 9781838648145

Get up to speed with core PostgreSQL tasks such as database administration, application development, database performanc

1,450 87 32MB Read more

Data Science Concepts and Techniques with Applications 981156132X, 9789811561320

This book comprehensively covers the topic of data science. Data science is an umbrella term that encompasses data analy

1,583 225 4MB Read more

Big Data Processing With Hadoop (Advances in Data Mining and Database Management) [1 ed.] 1522537902, 9781522537908

Due to the increasing availability of affordable internet services, the number of users, and the need for a wider range

162 101 5MB Read more

Modern Data Access with Entity Framework Core: Database Programming Techniques for .NET, .NET Core, UWP, and Xamarin with C# [1 ed.] 1484235517, 9781484235515

C# developers, heres your opportunity to learn the ins-and-outs of Entity Framework Core, Microsofts recently redesign

1,741 356 21MB Read more

Succeeding with 5S

174 14 2MB Read more

Graph Data Processing with Cypher 9781804611074

A practical guide to building graph traversal queries using the Cypher syntax on Neo4j

1,399 29 28MB Read more

Succeeding with your Master’s Dissertation

dissertation

0 0 3MB Read more

Introductory Digital Signal Processing with Computer Applications

1,805 423 447MB Read more

Data Science Concepts and Techniques with Applications [1st ed.] 9789811561320, 9789811561337

This book comprehensively covers the topic of data science. Data science is an umbrella term that encompasses data analy

1,308 212 6MB Read more

Developing Modern Applications with a Converged Database. Succeeding with Diverse Data Formats, Processing Techniques, and Workloads
9781098107352

Author / Uploaded
Alice LaPlante

Table of contents :
Developing Modern Applications with a Converged Database
Data-Driven Applications: The New Gold Standard of Enterprise Software
Barriers to Building Data-Driven Applications
Single-Purpose Specialty Databases
Converged Databases
Know SQL? You’re Set with a Converged Database
Converged Databases Address Common Data-Driven App Challenges
Some Potential Use Cases for Converged Databases
Moving to Modern Development Paradigms
Development Paradigm #1: Microservices
Development Paradigm #2: Event Streaming
Development Paradigm #3: API-Driven
Developer Paradigm #4: SaaS
Developer Paradigm #5: Low Code
Developer Paradigm #6: Distributed Data
Developer Paradigm #7: Continuous Integration/Continuous Delivery
Converged or Single-Purpose Database: Which is Right for Your Use Case?
Conclusion: Avoiding the Innovator’s Dilemma

Citation preview

Oracle

Developing Modern Applications with a Converged Database Succeeding with Diverse Data Formats, Processing Techniques, and Workloads Alice LaPlante

Developing Modern Applications with a Converged Database by Alice LaPlante Copyright © 2021 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: Andy Kwan Development Editor: Nicole Tache Production Editor: Kate Galloway Copyeditor: Penelope Perkins Proofreader: Christina Edwards Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea August 2021: First Edition Revision History for the First Edition 2021-08-19: First Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Developing Modern Applications with a Converged Database, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Oracle. See our statement of editorial independence. 978-1-098-10735-2 [LSI]

Developing Modern Applications with a Converged Database The mantra has changed. It’s no longer, every business is a software business—as attributed to Watts S. Humphrey, the so-called father of quality in software, and echoed by any number of tech leaders. That attitude is so last decade. Today, every business wants to be a data business. It is generally accepted today that organizations that realize and pursue this vision will accelerate ahead of their competitors, and remain ahead, especially in uncertain markets. But according to the 2021 “executive survey” on corporate data initiatives from NewVantage Partners: Only 48.5% of businesses are currently driving innovation with data. Only 39.3% are managing data as a business asset. Only 30% have a well-articulated data strategy for their company. Only 24.4% have forged a data culture. Only 24% have created a data-driven organization. Deloitte found much the same thing when surveying executives from large enterprises in July 2019. Although 76% said their data analytics skills had matured over the previous 12 months, the majority of executives were still using traditional tools like spreadsheets (62%) and business intelligence (BI) software (58%) to analyze data. Additionally, more than 6 in 10 (64%) organizations in the Deloitte study rely solely on structured data from internal systems. By doing this, they are

missing insights they could be getting from unstructured data like email, social media, images, and audio files. This unstructured data provides much deeper insight into events, actions, and trends that have the power to affect your business. A point in fact: Deloitte found that companies including unstructured data in their analyses were 24% more likely to meet or beat business goals. For today’s business, just possessing data—structured or unstructured—isn’t the goal. It’s having data-driven applications that can make sense of this data that matters. But incorporating unstructured data is an ongoing challenge. It either has to be converted into a schema recognized by a traditional database —an extra, time-consuming step—or stored in a specialty database. For example, if you’re building a fraud-protection payment system for storing relational data, you might need additional blockchain and graph analytics integrated with it to fight fraud on a real-time basis. In this report, we explain why developing data-driven apps with a single data platform potentially makes more sense than cobbling together numerous specialty databases. We discuss how you can use the latest techniques in software architecture—such as microservices, event streaming, REST (representational state transfer) APIs, and SaaS (software as a service) models—by taking advantage of the synergistic data technologies within a converged database, and that you can even analyze unstructured data such as graph, spatial, or JSON (JavaScript object notation) data without having to convert it. By providing the perspectives and examples of real-world practitioners from a variety of verticals, we will demonstrate how a converged database can streamline data management and ultimately allow businesses to achieve a competitive edge.

Data-Driven Applications: The New Gold Standard of Enterprise Software Data-driven apps are apps that use diverse sets of data—document data, sensor data, spatial data, transactional data, and more—to help businesses

derive immediate value from that data. These diverse sets of data are typically taken from multiple sources in real time. For example, a data-driven app might bring together historical customer purchase data with social media data and machine learning to make real-time recommendations to customers. Or it might use spatial data combined with transactional data to detect fraud in card-not-present (CNP) transactions. Such data-driven apps work across multiple platforms, such as mobile devices, web browsers, and even edge devices, and thus can be delivered flexibly and reliably no matter who—or what—consumes them. Given the ever-changing demands on these apps, they need to be continuously adjusted as the market shifts, or as user requirements evolve. And, they must be updated and patched while online, as they are expected to be available 24/7. These apps are built by taking a data-first approach, or, as database programmer Kenneth Downs wrote in his blog, by following a “minimize code, maximize data” strategy. In his words, you “bring the algorithms to the data, not the data to the algorithms.” Data-driven applications open up numerous opportunities for businesses, including: Leveraging large volumes of diverse data sets to automate business processes, solve problems, and seize new opportunities. Making data available throughout your business within a single application. Because they are designed to be contextual, data-driven apps can customize each view with accurate, timely, and relevant information. Integrating data to create 360-degree views of customers and operations. Allowing users to serve themselves by automatically querying master data, metadata, models, schemas, and graphs without needing to involve IT.

Two factors in particular are driving the rise of these data-driven applications: First, application development has been complicated by the new types of data we discussed, such as document, spatial, graph, temporal, and even weather data that wouldn’t have traditionally gone into an enterprise application. Second, extracting value from that data is also different. We’re using different techniques than we did before, such as machine learning and predictive analytics, to understand what the data has to tell us about optimizing our businesses. In summary: a data-driven application is an app that works with a diverse set of data and generates business value from that data in a very different way when compared to traditional or historical applications. These new applications have enormous ramifications for the databases we choose to deploy. It’s important to understand that data-driven apps themselves aren’t forcing developers to use multiple single-purpose databases. Three factors drive this approach: The myth that to take advantage of new techniques to generate value from data like machine learning (ML), graph analytics, and so forth requires a specialized database designed for that specific data type or workload. The latest software architectures used, such as microservices, advocate using a separate service for each part of your workload, and you should take advantage of the best technology for that particular service or workload. Encouragement by the single-purpose database vendors and cloud providers.

Barriers to Building Data-Driven Applications

Developing data-driven apps is not easy. Key challenges include: Complexity and cost in getting a 360-degree customer view Customer data is usually scattered throughout numerous systems. It doesn’t come in a unified form, either: some of it is transactional; some of it is unstructured; and some of it accumulates through event streaming. It is complex and costly to pull all this data together. “The most important problem is getting the data from where it’s stored to where it can be used,” says Dr. Benjamin Bengfort, CEO of Rotational Labs, which builds data-driven products and services for enterprises using state-of-the-art distributed systems technology. “Where is the data, where are the processors, and how many processors are required to use that data efficiently? Any time you move data around, you incur costs. So you want to minimize those costs as much as possible.” Difficulty of analyzing current and historical data together Some data-driven apps require developers to combine large sets of both current and historical data. But keeping large amounts of data in “hot” storage—storage that makes data readily available—can be expensive. Moreover, it can be difficult to ingest real-time event data in a reliable manner. Accessing and storing sufficient volumes—and types—of data to feed ML models Machine learning models are hungry for data. They need large amounts of historical data to train on, and they need access to new (unseen) data to continually improve their performance. This is a costly proposition as you need to make copies of, distribute, and store these data sets to support model development. Plus real-time predictive analytics requires developers to continually access new data—which can overwhelm the amount of compute available. Complexity and cost in managing Internet of Things (IoT) data in real or near-real time

Data-driven apps often require massive volumes of unstructured data, which clogs up your data pipelines and is a drag on your budget. Because raw device data is often stored in cloud-based object storage—and needs to be moved to a data lake or data warehouse before it can be analyzed— that adds further delay and expense to incorporating it. Considering these challenges, how do you decide which database configuration is best when developing data-driven apps? Combining multiple single-purpose databases such as machine learning databases, spatial databases, and graph databases? Or using a single, converged, unified data platform? It’s not an easy question for developers, in particular, to answer. Let’s take a closer look at each option.

Single-Purpose Specialty Databases Single-purpose specialty databases are just what they sound like. They are databases dedicated to a particular type of data, such as customers, product catalogs, or orders (see Figure 1). In this figure, depicting the case of an online retailer, each aspect of the business is developed as a separate microservice, leveraging a different single-purpose data store that is optimized for that particular use case. For example, in this case, the product catalog is a JSON document store, and product reviews is a key-value database. At first, this might seem the best fit for microservices, as each function has a separate database optimized for that use. The functions are independent, and independent databases are also more highly available. All this is very attractive to developers who may not be thinking past the initial development and deployment cycle.

Figure 1. Many single-purpose databases for an online retailer. The gold lines indicate the data movement and data transformation necessary between these independent services.

In fact, it’s probably not surprising that sides in this debate are mostly taken based on role. Developers, traditionally, have preferred using single-purpose databases. On the surface, it appears to be the easiest and fastest. And who doesn’t want to build apps more easily and at an accelerated rate? Say you tell your dev team that you need an app for an online retail storefront that allows customers to shop and check out their purchases. They would probably go ahead and commit to a high-end transaction processing system. But once the system is built, management decides it wants to make product recommendations to customers as well. You can’t do that with your transaction processing database, so you add another database—perhaps a data lake—that can manage machine learning algorithms and the immense amount of data they require. Now you have to access all customer historical data, ship it to the data lake to build models, and then run the machine learning algorithms to predict what else the customer might want to buy. Those predictions have to be sent back to your single-purpose transaction processing system, so that when customers place items in their carts, recommendations can be presented. There’s a fair amount of back and forth going on between those two systems, but all of this still has to happen very, very quickly. Now think about adding a third use case: a new distribution model of buy-online-pickup-in-store (commonly called BOPIS). Based on this customer’s location, if they want to pick an order up themselves, what are the closest stores within 20 miles or 5 miles? So now you need a separate geospatial database. This involves sending a zip code to that database, which does a geospatial query and sends the locations of the five closest stores back to the transaction processing system, which shows them to the customer. But then you realize you need to check whether the product being ordered is in stock at those stores. This involves another call to the inventory system. In the end, the relatively simple application you started out with has to manage and look after all the communications among

all the different systems. Retailers attempting to pivot to BOPIS in the middle of the pandemic often found themselves with rigid, conflicting systems: they might actually have an item in the physical store, for example, but the online ordering system might not recognize that fact and would turn down a sale. Suddenly, you have a complicated mix of connectors and APIs. Integration challenges abound. You now need to worry about data consistency, crossproduct compatibility, performance tuning individual databases, separate security patches and upgrade cycles, complex analytics integrations—and also figure out a way to consistently back up all this data. “Data comes in such different forms. Having to transform it into the general schema and trying to understand it is always a challenge,” says Dwight Gunning, a data scientist for a financial regulatory agency. Although nothing is inherently wrong with a lot of chatter among integrated applications—there will always be some communication required, after all —it can get very complicated very quickly the more of these single-purpose databases you use. There’s also potentially a performance hit due to network latency. Custom coding, which is often required when using multiple single-use databases, can also be an issue. The more hand-crafted the application code, the bigger the potential challenges are, especially down the road. Everything can be kept stable when the developer who built it is still with you, but once she is gone, who maintains the code going forward? Why did we do things this way? Why didn’t we try this? You can end up with performance problems, but may be reticent to change code because you’re not 100% sure what that code does. And, unfortunately, without a common understanding of what the code does, you could find your team moving backward rewriting existing code rather than focusing on future endeavors. This creates potentially long-term maintenance and productivity challenges. Perhaps the solution is not to throw even more database technology at the scenario, but to look at an alternative solution—a converged database. This type of unified platform supports all modern data types as well as the latest development paradigms. With converged databases, developers can

centralize database management, security, and maintenance without having to write custom application code to accomplish these basic tasks. A converged database also minimizes the costs associated with data transfers. Such a database allows developers to utilize its built-in machine language, graph analytics, as well as spatial and other capabilities rather than writing code to propagate the data from one single-purpose database to another—or having to manually integrate data pulled from multiple sources in the application code. For example, a movie streaming application can make recommendations on what to watch by running a single query in a converged database that spans three different data models: 1. Find a customers’ friends within three-hop friendships (graph data) 2. Identify those users who have watched similar movies (JSON data) 3. Locate those users who provided feedback with a 5-star rating (keyvalue data) Doing this type of query using single-purpose databases would require a lot of complex application code to take advantage of the analytics functions built into three separate databases and then aggregate the results. Let’s explore this approach in more detail.

Converged Databases “We saw a huge opportunity to take technology to the next level by using a converged database to provide real-time information to our customers.” —Giovani Cani, head of IT delivery at Wilson Logistics Group

The alternative to using multiple single-use databases is using a converged database as shown in Figure 2. A converged database natively supports all types of data and also has synergistic data technology embedded into it to

support the latest paradigms in software architecture. In other words, it addresses both factors that are putting the spotlight on data-driven applications today: organizations today need to work with growing and more diverse sets of data, and there are new and exciting ways to work with and learn from this data. When the same database supports both machine learning and spatial data, for example, you can more easily perform predictive analytics on spatial data.

Figure 2. A converged database in action. The light pink outline indicates the reporting data warehouse can use cross–data store queries to do real-time analytics.

You may be thinking that a converged database is a monolithic database, but that’s not the case. Rather, it leverages data containers that can be combined into any number of physical databases. Organizations can store their data in logically separate data containers called “pluggable databases.” These pluggable databases can easily be physically combined to simplify

deployment, allowing businesses to do queries across data stores for realtime data analyses. Individual data stores communicate through a uniform REST API, so there’s little to no need to move data across data stores. According to Jesse Anderson, managing director of the Big Data Institute, you ideally want one data source to rule them all. “That’s what a converged database is at its core,” he says. “If you can have a single database that can do both transactions and analytics, you don’t have this issue of getting one response when totaling up sales for the day from your main database, and a different number from technology X. That’s really the key value proposition.” A good analogy for a converged database is the smartphone. Today, smartphones integrate audio, video, messaging, camera, calendar, music, and many other features into a single product. Previously, consumers had to buy a phone, a camera, a music player, a personal computer, and possess a lot of adapters to make all these things work together. Today, these point products —and the way they integrate together—are expected features of smartphones. With smartphones, the synergy across features makes the whole better than the sum of its parts—precisely because of the tight integration and the new workflows this integration makes possible. The phone integrates with the camera and automates the photo storage and backup process. You can send photos via email or text, and post them on social media. The calendar is always kept up to date because it uses the phone’s internet connectivity to sync with the cloud. The music app can stream music from a personal—and virtually unlimited—music library that is also in the cloud. Because of this aggregated synergy, each of these separate features is more powerful compared to the single-purpose device it replaces. Plus, with single-purpose devices you are on the hook for manually integrating these point solutions. Yes, it’s true that costly, specialty cameras and stereo systems can produce better-quality photos and sound than smartphone cameras and music apps, but advances in innovation and economies of scale are continuously closing that gap. You get the same ease of use and synergy with a converged database. Because standard SQL can be used to run sophisticated machine learning,

spatial, and graph models, you can easily run your queries in one place rather than having to query separate databases and deploy different APIs. Converged databases also help with volume. They help scale because their tools are designed to elastically work with massive amounts of data. A converged database can scale in terms of disk/memory (MB/GB) or in terms of the sheer number of records (breadth). “A converged database seems promising because first, it means your data engineers will need to master fewer tools. So it’s easier for hiring and for maintaining knowledge inside the organization,” says Adi Polak, a senior manager in the cloud advocacy group at Microsoft. Another thing that converged databases can help with is performance tuning, Polak says. Often when we work with multiple databases, each database has its own tuning requirements, which also require knowledge and dedicated people. “And a converged database can help provide one single place for inhouse clients to consume data,” she adds. Of course, all this depends on the cost model. “The ROI of insights is very difficult to calculate,” says Polak. “You have this huge amount of data. You’re ingesting all this data into the system and extracting insights. So there’s always the issues of A, how much did it cost us to extract these insights versus how much we benefit from them. And B, are the insights accurate?” Paul Modderman, CTO of Bowdark Consulting, an SAP and cloud technologies architect and consultancy firm, was working on an extended warehouse management initiative for an SAP system for a client. One of the challenges was that the firm had a homegrown SQL server, a time-tracking tool for employees who worked at various warehouses. “The largest part of the project that we did was presenting a single perspective on those two very different things—warehouse management metrics and employee time tracking,” he said. “A converged database would have helped present that perspective faster and better.” Without this synergy, developers may spend all their time gluing together APIs, databases, files, RESTful services, and the like, Modderman added:

“If a converged database is collecting those things for you, you’ve just accelerated your time to value. And that’s very powerful, because managing those various connections and settings and the glue for those things is hard. Converged databases make that easier.” The IT team at Wilson Truck Lines, in Ontario, Canada, would agree. For more than eight decades, Wilson has provided innovative logistics and trucktransport services. The privately held company developed its fleetoutsourcing model for the packaged goods sector—grocery chain Metro, a Canadian food retailer with more than 260 stores throughout the province, is its biggest customer—but that model applies to any industry, including retail and manufacturing. Having grown over the years—now managing several yards with 600-plus trailer trucks and five distribution centers—Wilson needed to enhance its understanding of logistics volumes and patterns to boost efficiency, standardize processes, and optimize costs. Giovani Cani, head of IT delivery for companies within the Wilson Logistics Group, is using Oracle’s converged database for a number of critical business applications, including optimizing grocery deliveries to Metro. “Drivers go to the distribution center, load up with groceries, and drive around delivering them—and we track them, capturing that data in our converged database,” says Cani. Along with delivering packaged goods, Wilson Truck Lines, which is a company within the Wilson Logistics Group, also does all the shunting and yard management tasks associated with each distribution center. “Imagine a massive parking lot of 53-foot trailers. These assets need to be located, moved to the dock, and eventually moved back into the yard, supplier, or to the store depending on the delivery schedule,” says Cani. “When you have hundreds of trailers being moved all the time, 24/7, 365 days a year, you need a system that can keep up with the load. Tracking location, number of days in the yard, preventive maintenance, trailer specifications—size, number of axles, door style—load type, and status are

critical to the type of service we provide. We do all the managing and reporting on our converged database.” Wilson developed an application that enhanced and streamlined the tedious daily yard check, reducing the time needed to complete it by 50%. The app allows workers with handheld devices to scan barcodes on all trailers and quickly enter details such as contents, availability status, and set temperature, thus automating the entire yard-check workflow. Data gathered in the new digitized system is available in real time to all distribution centers. The data collected is time-stamped, geotagged, searchable, and kept for historical analysis, providing greater transparency and accountability across the enterprise. Trailcon Leasing Inc.—another company within the group that specializes in trailer lease, rental, storage, and mobile repair services—provides customers with real-time data about their assets. Different data such as geolocation, repair history, ownership docs, or pictures from the last inspection is all stored within a converged database. The synergy of the data enables the IT team to deliver solutions that wouldn’t be possible if the information were spread across different databases. Wilson believes these systems have positioned it well ahead of other transportation companies. “Lots of our competitors still use pen and paper, Excel, and lots of different databases, all leading to poor communication, integration issues, and more,” says Cani. “We saw a huge opportunity to take technology to the next level by using a converged database to provide realtime information to our customers.” Leveraging Oracle Cloud Infrastructure (OCI), Oracle APEX, and the converged database, Wilson can now deliver reports in days compared to the weeks or months it took previously. Employees, from managers to field workers, can now see key performance indicators on dashboards that are available on desktops, tablets, or phones. They can easily filter data in millions of records and drill down into more details to allocate resources more efficiently, as well as forecast demand.

Being able to process and store both structured and unstructured data was critical for Cani. “For us, geolocation is a big thing. We do use spatial for that within our converged database,” he says. He also has to store JSON documents, EDI (electronic data interchange) data, and XML (extensible markup language) data. “To me, this is the only way to go because your other option would require integrating all the different systems from specialty databases. And once you start introducing integration, you add cost, you add time, you add complexity, and at the end of the day, you’re never going to have the same results in a timely manner as you’d have if all were under one umbrella,” says Cani. He is now looking into using more of the innovative features within the converged database, such as artificial intelligence and machine learning, to provide him with additional insights about the data. He can ask questions such as: what’s the best time to deliver groceries based on traffic patterns? Who are my best drivers and which ones require further training? Why does a given trailer’s brake system fail more often than others? Why are so many tires getting damaged? “These are issues that you can dive into once you have the data, and ultimately provide much more value to the business,” says Cani.

Know SQL? You’re Set with a Converged Database “A lot of times that complexity is the biggest cost of getting your datadriven strategy right.” —Scott Haines, senior principal software engineer at Twilio

One of the key benefits of using a converged database is that if you know how to write SQL, you can access all its capabilities without having to learn anything new. So if one of the arguable advantages of a single-purpose database is that it has APIs that are intuitive for a particular data type or data model, then the advantage of a converged database is that if you know SQL or REST API, you can do it all. Somebody who is primarily a SQL query writer, and happy in that world, can do graph analyses as well as manipulate geospatial or

JSON data because the converged database allows them to use the same language or the same methodologies that they’ve always used. Although increased productivity for developers is generally a difficult benefit to quantify, converged databases can act as a catalyst to increase productivity from data engineers and data analysts alike. And a converged database gives developers the ability to experiment. If an operation takes eight hours you might try just one thing a day. But if you can get it done much faster, you can try multiple things a day, and that can open up business opportunities that weren’t practical to explore before. Indeed, advanced capabilities are often embedded in converged databases. For example, leading converged databases offer capabilities such as machine learning. You present your data set and articulate the business question you want to ask, and the converged database tools will help you determine which machine learning algorithm to use. Then, once the model is built, you can run a SQL statement that specifies the customer and asks what the likelihood is that he or she is interested in buying a watch, say, or has a pet cat. And the answer comes back. Because the querying and analyses are all done on a common platform, you don’t have to manage the communications between different systems. There’s none of that overhead. There are no changes in the APIs that you have to learn. You can interact with the machine learning and the geospatial capabilities using standard SQL. This can lead to the democratization of many of these specialist capabilities. Although a converged database won’t make you an expert in advanced methodologies like machine learning, it does make them more accessible via an interface that’s already familiar to you. Drop Tank is another company that has invested in the converged database vision. Founded in 2012 by David VanWiggeren and Tim Miller, the Chicago-based firm’s mission is to tap into the tens of thousands of point-ofsale systems used by the diverse companies that own and operate gas stations across the United States to offer fuel rewards programs. The firm has grown beyond gas stations recently, partnering with other major third-party loyalty programs.

Since the beginning of 2019, Drop Tank’s portfolio of participating gas stations has grown from 3,500 locations to 4,800 and transactions have more than tripled. After much trial and error, Drop Tank began using a converged database in the cloud to store and analyze data from its growing list of stores and consumers. CIO Tim Miller recalls that when he formed the company almost a decade ago, he didn’t want to manage technology at all. He and his cofounder were working at a large oil and gas company when they saw a gap in the market regarding program rewards. “For example,” says Miller, “you’re in the Chicago area and you regularly visit a gas station that’s near a dry cleaner. The dry cleaner could run a promotion that says, this Saturday, bring your dry cleaning and get 10 cents off your next purchase of gas. And so we would sell them plastic cards and the dry cleaner would hand them out. The gas station would get the business, and the dry cleaner would benefit from the promotion.” But when it came to building a system capable of compiling the broad range of both structured and unstructured data that resulted from such a promotion, Miller was adamant that he didn’t want to build a whole technology department to support this: “I’ve been down that road, done that. I don’t want data centers. I don’t want an operations team.” Drop Tank has been a cloud native company from the beginning. “Before, it was all about processing power; now it’s about the data,” said Miller. “It’s about having access to all that data that we’re getting from almost 12,000 locations on our network.” He needed a backbone to support all of it. “And then, what do you do with it? Do you put it in a data mart or data lake? And how do you curate it? How do you govern it? We get all of this with our converged database and its associated tools.” An added benefit when using a converged database is that the data isn’t moving. You don’t have to send data over the network or wait for data to be sent because it’s all in the same data store. You have easy access to it. Things would be very different if you had a number of single-purpose databases cobbled together. Says Scott Haines, a senior principal software

engineer at Twilio, “it’s common for large organizations to load structured data into a number of cloud databases, like Snowflake or Redshift for one purpose, while simultaneously having a wealth of raw data sitting in their data lakes, accessible through different technologies like Amazon Athena or Google’s BigQuery. You’ll have to start thinking about your data a little differently at that point.” You need to figure out what form the data is in, and whether that’s interoperable with the different data stores in production, says Haines. “And if it’s not, then things get a lot harder, because you’re going to have to transform data from one format to another, and to another, and there are more and more places where things can go wrong. A lot of times that complexity is the biggest cost of getting your data-driven strategy right.” Jesse Anderson agrees—but with a caveat: “Management, for cost reasons, would obviously prefer one database. And depending on who you talk to, this is either possible or a pipe dream.” He says the potential savings of having a single converged database are very attractive: “There’s the savings of not having to retrain DBAs [database administrators] on something new. And then developers are also much more productive. They just need to learn the new way to query blockchain, for example.” And, finally, you don’t have to search through 20 or 30 different database technologies to find the right fit for your use case. “You’ve narrowed it down to maybe five choices in the converged database world.” But, can a converged database really do all that it claims? Joe Reis, a cofounder of Ternary Data in Salt Lake City, thinks that the vision of a converged database is a noble, but challenging, one. “I increasingly think that it could be practical,” he says. “But, database use cases are diverging. Realistically every time I’ve seen people try and do this, including myself— having one database to rule them all—it doesn’t end well.” Resource contention is the most obvious challenge. Let’s say you have a transactional workload occurring at the same time that you are querying lots of data. “If you’re not aware of nuances, you can run into trouble,” said Reis.

Matt Housley, also a cofounder of Ternary Data, shares Reis’s concerns: “I think a converged database is good in principle. But it might be a bit of marketing-speak to say these databases can do everything. I’ve never seen that actually work.” For Housley, a better understanding of the technical boundaries of converged databases would be helpful. At what transaction rate is he going to start having problems? When is he going to have challenges with his analytics queries? How much data can he realistically put into a system like this? What is the cost structure going to look like? Ultimately, Housley and Reis agree that the concept is a very intriguing one. “We do see clients all the time that have exactly this need,” says Housley. “If there’s a way for a database architecture to take care of everything with less engineering, then great.” Mark Carleton is CEO of MESTEC, a UK-based maker of cloud-based solutions for digitizing the manufacturing life cycle in factories. He’s already using a converged database, and what’s more, says it would have been “unworkable” to attempt any other configuration. MESTEC collects real-time data from factory operations—it might come from people, or it might be generated by machines—and enables analyses and insights that drive continuous improvement to factory operations. “Our converged database provides us with a single place to store all our data,” says Carleton, who says he “can’t imagine a world” in which he would have a different database for each data type. “That would be completely alien to our purpose.” His Oracle Autonomous Transaction Processing database stores relational data along with JSON documents and large amounts of IoT data, and “it would be inconceivable that we would want to separate it all,” says Carleton. When you need to store and process unstructured data such as a connected graph or raw JSON, it is tempting to choose a single-purpose specialty database that allows you to immediately query the data without having to deeply think through the data’s schema, says Bengfort. Bengfort cited the case of a US Department of Energy project utilizing synchrophasor sensors that monitor the angle and magnitude of voltage and

current across transmission lines in a large-scale electrical system covering each interconnection, which is approximately half the geographic area of the United States. Handling the huge volume of data coming in, and determining how to use it in applications, was a challenge. The goal of the project was event and anomaly detection providing real-time alerts to grid operators. Detrimental events on the grid occur on the timescale of milliseconds, requiring machine learning models that detect the problem, judge the severity, and alert the control center. But for this to work, the processing has to be as close as possible to where the data is being generated. “Every millisecond that goes by—of this data going from its source through the data lake to the ETL [extract, transform, and load]—is a millisecond delay that could cause more damage to the grid,” says Bengfort, who said for this kind of application, you might think of getting into a more constrained— i.e., single-purpose—high-volume, time series data store. The risk, of course, is that if a single-purpose database can only handle a finite set of use cases, then innovation will be inhibited as new use cases come in. If this project continues, they will want to extend the database to perform historical analysis and capital assets planning. Bengfort has yet to work with converged databases. “I think, to be successful, a converged database needs to provide the flexibility to go up and down that trade-off between constrained and not constrained on a per-application, peruse-case basis,” he says. “And that is a challenging technical problem.” A good converged database system will reduce the amount of replication that occurs as a result of having these multiple data systems that are just doing ETL, says Bengfort. It will also encourage flexibility and evolving use cases over time. “The data problem of today is not going to be the same one 18 months from now. Can the database deal with that?” Carleton says he can see an argument for very niche use cases to go with single-purpose databases. “I imagine that if you’ve got a very special and narrow purpose, a specialized database specifically designed for that application might be the best solution,” he says, “but frankly, we’ve never found anything the Oracle database can’t do.”

Converged Databases Address Common Data-Driven App Challenges “I have that one database in which all of the data resides, and all the bells and whistles and alerts and monitoring and maintenance tools are in that one space.” —Tim Miller, CIO at Drop Tank

While skepticism exists around converged databases, they do address some of the challenges organizations face when building data-driven apps. Most importantly, converged databases are capable of centralized database management, which means that IT doesn’t have to bear the immense responsibility of separately managing apps built on an array of singlepurpose databases. This is not feasible on a large scale, as it leads to endless reintegration, retesting, retuning, and troubleshooting, and requires your IT people to be knowledgeable about the operational aspects of each singlepurpose database. With a converged database, your business uses a common approach to security, upgrades, patching, and maintenance across all deployments of the system. Drop Tank’s Miller says it’s much easier to manage a converged database. “I’m not searching through 15 different databases, while trying to monitor and maintain them all,” he says. “I have that one database in which all of the data resides, and all the bells and whistles and alerts and monitoring and maintenance tools are in that one space.” While it’s far cheaper—with eight different applications interacting with the one database in Drop Tank’s case—there’s also a risk. If the database has a problem, there’s the potential of taking every application that depends on it offline. “I can see how some may perceive it as less risky if you decide to go with a nonconverged database where every app has its own container and its own database,” says Miller. But then, “sometimes you have the idea that, well, I’ve already got a [single-purpose] database, I’ll just keep adding to it. No, no, no. A converged database is really the only way to go.”

A converged database has a number of advantages: Consistent security across workloads With converged databases, the security mechanisms you have in place cover all the data across all workloads. You don’t have to worry about varying security measures across different specialty databases or deploying security policies for every individual database, which naturally make a data-driven app more complex as it takes data from various databases. Ironically, combining single-purpose databases—often referred to as “best-of-breed”—frequently results in the worst-of-weaknesses scenario, as the weakest link in all the connected systems will define how strong the security is for all, because if a hacker or cybercriminal can get into one of the systems, they will be able to get access to all of them. Opportunities for workload synergies Because you can employ a common set of operations with converged databases, you can apply the same function to different workloads, rather than each workload having its own operations to run. Easier access to insights from the data One of the most obvious benefits of a converged database is the ability to pull data from different places to perform queries. This simply wasn’t possible before converged databases, says Paul Modderman. You can now pool your data and enrich it. You can, for example, report on sales by region or do aggregations so that you can get a more holistic picture. “You get a lot more context for your decision-making,” says Modderman. “There’s so many different benefits to swiftly ensuring that the right people have access to the right data without having to go and talk to 10 teams to try and pull a report,” adds Scott Haines. Deploy anywhere

A converged database allows developers and IT to come together and choose the deployment option that works best for their mutual needs and to move seamlessly between such options as required. This optimizes developer productivity. Additionally, leading converged databases will also typically support model-specific access methods for graph and spatial queries, as well as offer libraries of common machine learning algorithms. Generally, these capabilities can be accessed via either RESTful APIs or stateful connections. Some converged databases provide even more value by supporting the JSON data used in web and mobile applications. Unlike single-purpose document databases, converged databases can generate schemas and indices from JSON objects. This makes it possible to do parallel SQL analytics, transactions, and JSON data joins with relational, graph, and spatial data. It also means that you can get great data synergy by being able to run ML or graph analytics on the JSON data without having to transform it or move it to another system. These capabilities give developers simple API-driven access and modelspecific languages, while enabling them to still use SQL capabilities as they prefer. And IT can be confident that the business is taking a unified approach to security, upgrades, patching, and maintenance across all deployments of the converged database. The Big Data Institute’s Anderson says choosing the right database is an absolutely critical decision point. Companies must take it very seriously. “If you go down the list of some of the failures that we’ve had to clean up, they were instances where [companies] chose their databases poorly,” he says. “And those cleanups will take sometimes a year or more, so the risk/reward here is quite high.” If you choose the wrong database, and it doesn’t work, your cleanup or your technical debt is not a month’s work—it’s a year, he says. “And we’re talking millions of dollars. It can be quite a fiasco.”

Some Potential Use Cases for Converged Databases

Modderman, who tends to work in full data stack development—meaning not just the database, but application servers and the user interfaces for those applications, both mobile and desktop—says he would use a converged database because it would enable him to spend less time “gluing things together.” In the retail industry, companies typically use multiple architectures to put together anything resembling a coherent data environment. A typical scenario, says Housley, is to have a large number of applications, then feed data from there into a batch system for handling analytics, providing insights perhaps once a day or once an hour. “Alternatively, some retailers are using a more complex data architecture that includes streaming, where they track all the changes in their applications and try to analyze that stream in real time,” he says. “But that can get expensive and complicated.” Reis agrees: “In retail, the pipelines that provide analytics are very fragmented, and a converged database, if it shortens the cycle, would ease the cognitive load on the team.” For smaller retailers, a converged database might be interesting, especially as a managed cloud service, says Housley. “If they don’t have the engineering resources to put something like that together, to be able to have some realtime analytics on their customers on orders, a cloud-based converged database might be a good solution,” he says. Reis thinks that the use cases are very similar for the FinTech companies his firm serves. “Both types of industries are trying to get data faster to improve the customer experience,” he says. “At the end of the day, things like speed and reducing time to value and automation are the name of the game.” So increasing the pace at which developer teams and data science teams can collaborate and release new features is always good. Housley concurs. He’s seen a lot of cases in both FinTech and ecommerce where people are taking database tools designed for analytics and using them in ways that they were never intended to be. It doesn’t end well: “So if you have a case where you need to have your analytics and your real-time

transactions in the same place, then a converged database could be a good fit.”

Moving to Modern Development Paradigms Businesses have been using a fairly new set of development paradigms to develop their data-driven applications—things like microservices, events, blockchain, and more. They’re also being forced to think globally, and so the data they use needs to be distributed and available 24/7 (see Figure 3). Additionally, when businesses make changes or improvements to these applications, it needs to happen with continuous integration and continuous delivery (CI/CD). They’re also beginning to use low-code development environments to prototype quickly and get the data out to where it’s needed to create value.

Figure 3. Modern development paradigms.

In this section, we explain how converged databases support these new development paradigms to help create value from data.

Development Paradigm #1: Microservices One of the problems with the old approach of having one development team, one development platform, and one data store was that if you wanted to make a change in the data model or to the application in general, you had to get buy-in from everybody. They would all have to agree to the change and then agree to the downtime schedule to make the change. This was a decided barrier to rapid innovation. Microservices revolutionized that by making it possible for the application management process to be developed separately. So one team can do transaction processing and another can do machine learning or predictive analysis. They run completely separately so that if one team wants to make a change, they can make it without impacting the others (see Figure 4).

Figure 4. Microservices.

The impact of microservices has been huge. Most specialty, single-purpose databases sprung up around microservices because when you draw the architecture for these microservices, you can see the business challenge that they’re trying to solve, and then you normally see a data store underneath it— a unique, separate data store for that microservice—because users are looking for that independence from other teams and business units. The problem is, when you put each of the microservices into a separate physical database, you end up with a massive number of databases to manage and monitor and keep highly available and backed up.

Converged database vendors don’t think you need to have a separate data store for every one of these microservices. Some of the services can share a data store or they can be tenants within the same physical multi-tenant database as shown in Figure 5.

Figure 5. Multiple microservices deployed in dependent data stores colocated in the same physical databases.

Imagine this: take a big database and divide it into pieces so each of the microservices gets its own data store (pluggable database). The data stores

can then be easily combined to simplify deployment in one physical database (container database) or stored in separate physical databases to improve isolation and scalability. So, you get all the advantages and the agility that a microservice architecture promises without any of the downside, without any of the complexity of trying to integrate these things because everything coexists, sharing the same security and availability models, and it all speaks SQL. But each microservice still has its own independence. Containers are a big part of how microservices work. Containerization provides better isolation of applications, which yields numerous advantages. And when deploying an orchestration tool like Kubernetes, you can virtualize your underlying resources, leading to greater agility and, especially, application portability. Leading converged databases enable multi-tenancy and embed containerization and orchestration within the data tier. This makes it easier to deploy microservices. You can have a separate data container for each microservice you create. Each data container is isolated and secure. It uses the appropriate data type for each microservice, yet IT manages many data containers as one by colocating them in one or more physical databases. All of the data containers can be backed up, patched, or upgraded as one. But an individual data container can just as easily be stopped, started, or removed from one physical database independent of the other data containers.

Development Paradigm #2: Event Streaming When you are working in a microservices architecture, you need to ensure communication between the services. This is frequently done using an eventbased architecture, where you send and receive events from different services that need to be processed. Leading converged database vendors enable event streaming to and from Apache Kafka, the most commonly used event-based architecture, allowing database changes to be streamed into Kafka. Converged databases can also query events directly from Kafka.

Even if you’re not using Kafka, you can typically take advantage of event queuing in a converged database. With this queuing capability, you get full asset transaction support, as well as the power of SQL queries over events. “In a converged database I can see events becoming more common because the data is from different sources, and events would then inform how to join and enrich it,” says data scientist Gunning. “My assumption is that a converged database would need to have a richer, more powerful event technique than a normal database.” Haines agrees. Streamlining the ingestion of data into your database using event streaming not only tackles the problem of distributed writing across numerous tables (indexes), but can also be used as a mechanism to back off and queue up events, he says. “This is preferable to taking down the system in the case of system overload. Kafka has been invaluable for this in my experience, and if the converged database is handling the ingestion directly, it provides a huge value add for data engineers.”

Development Paradigm #3: API-Driven “In a converged database I have this big pool of data that I can pull different types of data from and then just put an API on top of it. I don’t have to worry about having different connections.” —Dwight Gunning, data scientist for a financial regulatory agency

Another key aspect to simplifying the development of data-driven applications is to have an abstraction layer above the database, above all your data sources, and allow the application to access the data further down the stack via APIs. Leading converged databases have the capability to access Data as a Microservice (DaaM) by automatically generating a REST API on top of SQL or stored procedures. Applications can then access data with a REST call like any other service, and the application doesn’t need to know anything about your database. Your developers don’t even need to be expert SQL writers or coders. This simplifies and standardizes your data access by taking advantage of REST endpoints that you can create inside the database.

The good news is that it’s not just standard SQL that works this way. You could expose Java or PL/SQL or JavaScript procedures through these REST endpoints. REST-enabling stored procedures extend microservices with indatabase functions (lambdas). Because you can process those procedures inside the converged database, you can avoid multiple round trips to and from the database. Instead, you can put all your business logic inside the stored procedure and get the benefit in a single round trip to the database. In the same way that we would use JavaScript in a browser to avoid network round trips, you can use REST capabilities inside the converged database to do exactly the same thing. “In a converged database I have this big pool of data that I can pull different types of data from and then just put an API on top of it,” says Gunning. “I don’t have to worry about having different connections.”

Developer Paradigm #4: SaaS When you are building modern applications, chances are good that you’re going to need a SaaS or multi-tenant environment. It’s becoming more and more popular to do this. With leading converged databases, each tenant gets its own separate data store or pluggable database. And that way you can avoid having to put any code into the application to separate or secure the tenants from one another. Each tenant or each version of the app will point to just a single database, but because you can manage many pluggable databases as one database, it’s easy to implement and maintain (see Figure 6).

Figure 6. How SaaS works.

And because each tenant has a pluggable database, you’re going to get all of the advantages of a multi-tenant database. For example, you can quickly clone tenant environments. If a tenant needs to be upgraded or requires a dev or test environment, you can instantly do that. Or, if you want to move tenants around to get better utilization of your hardware, you can unplug them and plug them in somewhere else. Using a multi-tenant architecture to deliver your SaaS applications works because not only is it going to simplify management once apps go into production, but it also will simplify the application because you don’t have to write security code—which is complex and prone to error—to prevent the tenants from seeing each other’s data. That’s all going to happen automatically. Tenants can also use all the standard analytical tools to

connect to applications because they are isolated into their own pluggable database. They can use any of the standard BI tools to access the data.

Developer Paradigm #5: Low Code Low-code application development is a very powerful and useful technique for any business. It abstracts the complexity of coding and provides highlevel commands—or, more commonly, drag-and-drop interfaces—that allow developers to innovate quickly and rapidly get a prototype up and running. Your application users then can see what the application is going to look like and can help iterate in its development. Leading converged database vendors have integrated these low-code/nocode tools with the database, providing a simple browser-based development experience. This eliminates much of the complex coding, because you don’t have to worry about mid-tiers, connection or state management, or trying to map application data types to database data types. Low-code development is truly data-driven app development because it automatically creates applications starting from a spreadsheet or table. And you can easily iterate and improve apps to align with the changing needs of your business.

Developer Paradigm #6: Distributed Data Converged databases also support sharding. Sharding is when you take a large, monolithic database and break it up into separate physical pieces. Each shard has its own separate converged database running on its own separate hardware. So if you needed the data for a particular application for a European division to reside in a European country, that’s easy to do. You create your European shard, but you still get a universal view of your data. Traditionally with sharding databases, your application queries need to include the shard key. And that’s still true with most converged databases. But leading converged databases also give you the ability to run queries across all of the shards to get a global view of data as well, which is important because data-driven applications tend to be global and exist on a

larger scale than traditional applications. Sharding is key to making that happen by providing you with a mature SQL database plus a highly available and sharded architecture.

Developer Paradigm #7: Continuous Integration/Continuous Delivery Today’s businesses seek to support customers all over the globe. They don’t have the luxury of having a window where they can take a system down to do maintenance or evolve or update the application. All that has to be done while the app stays up and running. Because converged databases support continuous integration and continuous delivery, businesses can have, for example, multiple versions of their application code active on the database, allowing them to slowly move customers over to a new version. And, most leading converged databases ensure that all of the maintenance activity that businesses would normally do on a schema can now be done fully online while the application stays open for business. Increasingly, converged databases are making it possible to abstract the schema definition out of the database by taking advantage of JSON documents. Because those documents are available inside the database, you can manipulate or change your schema by simply updating or inserting new JSON documents.

Converged or Single-Purpose Database: Which is Right for Your Use Case? How does an organization decide if the time is right for a converged database rather than a specialty single-purpose database? Gunning advises looking at workflows over the past 12 months or so. What kind of data was used? Did the data have different schemas or file types for which custom code had to be written? How much custom work was

required? “Now look to the future,” Gunning says. “How likely is it that you are going to have to do a lot of custom work to get the answers you need from the data?” Bengfort says that before he could recommend a converged database, a number of admittedly difficult-to-satisfy requirements would need to be met. “If we are presenting a unified front as an organization, if we are committed to training, documentation, and to reducing technical debt, a converged database would provide a lot of benefits,” he says. Most importantly, he says, a business would have to be committed to solving problems as a united organization rather than as independent business units, and to be willing to put time and thought into standard practices and methodologies. Unless those requirements are met, says Bengfort, he will always err on the side of flexibility, speed, and independence of single-purpose databases. Gunning disagrees. The more differences you see in the data that you have to work with, or the more custom coding required just to join tables and pull information from the data, the stronger the case for using a converged database rather than a specialized one. “If you’re working on a project involving image data, for example, will you have to merge it with structured data or enrich or tag it in some other way to get your answers?” he asks. “If so, then a converged database would be for you.” WHO CHOOSES THE DATABASE? A natural tension exists between centralized IT and developers, many of whom could be working in satellite or department-specific IT groups. Centralized IT is concerned about standards, ongoing maintenance, management, and general operational efficiencies and costs. Developers are more focused on the specific task at hand and are also typically excited to use the latest technologies. This can lead to disagreements about which database is most appropriate for a project. Often, this is settled by organizational hierarchies and precedent, but sometimes it leads to problems with “shadow IT” activity.

Modderman points out that business users—especially the subject matter experts (SMEs) in a certain domain, such as shipping or production planning —think of data holistically. They tend to see the big picture in terms of what data needs to be accessible to solve a business problem, regardless of where it might happen to reside. Technical personnel tend to be restricted by thinking of data in silos. “In my previous job, we were working on an initiative to make a central data platform using SAP,” says Modderman. “But because it was such a large company, it had a number of other applications containing data that put the SAP data in context. The data itself is worth so much more when it’s enriched, when it’s together, when it’s converged.” This made him understand that the people who are running the business don’t necessarily think of data as living in separate systems. “They think in terms of core business fundamentals,” he says. “They don’t see entity models or tables. They see orders and customers and suppliers. They have to answer questions that their competitors aren’t asking.” This holistic way of working with and thinking about data is what a converged database makes possible, says Modderman. “You don’t want data in silos, or data saved in places so only systems experts really understand what’s going on,” he says. “You have to think up questions that are not contained in structured databases. You have to dream up the weird questions and then work your way to finding answers or at least paths forward. And you’re less able to do that if all of your data is still planted in these little pockets all over the place.” Modderman once worked with a client who was very knowledgeable about SAP CRM. “He would always be able to say, okay, the data you want is in this table, except that’s not going to tell you the whole story. You’re also going to have to go over here and look at this and pluck this little bit off here. And if you mix that together, that’s going to tell you what these people are asking for.” Modderman says this SME always anticipated both challenges and opportunities—perhaps more so than the pure “data nerds.” “The techie

geeks were great at plugging together platforms and making clean pipes and all that stuff,” he says. But, the SME was always one step ahead, in terms of what questions they were really trying to answer for the business. A converged database enables SMEs to do their jobs faster because they spend so much of their time contextualizing multiple systems for vice presidents and directors. “They’re always connecting those systems together in their heads,” says Modderman. “But when the converged database has that glue built into its own fabric, those people can answer those questions faster. When that glue is soaked into the system, that system is all the better for it. SMEs can paint the picture so much faster when the converged database is humming along.” When thinking about converged databases, there are two steps to take, says Bengfort. The first is the initial application development. And the second is how to optimize that application for how it’s going to be used. “Say this is a clustering application,” says Bengfort. “The first thing you want to ask is, how stateful is the application? Are there more reads or more writes? Is it more transactional? Those are the sort of first, very high-level, general questions that you want to ask.” Throughput is another consideration that no one can ever really estimate accurately, but also has to be in the front of your mind when determining which type of database to use. Is your application/database going to be used thousands of times a second or a couple of times a minute? What is the primary access paradigm of this database? What kind of caching is allowed? What can your application do in the front end if it can’t read from the database? “Those are probably 80% of your questions,” says Bengfort, who adds that analyzing the type of data your system will use isn’t a good indicator of what type of database you should use. “Data type maybe expresses some of these things implicitly, but you actually do a little bit of a disservice to yourself if you start with a data type,” says Bengfort. He doesn’t think that application developers or even computer scientists in general are that good at saying, this should be a columnar data type, or this should be a graph type, a key-value

type, a row-based data type, or a document data type “because you can do transformations on any data types. I can’t tell you how many times I’ve had to take a document database and turn it into a normalized or relational database. I can’t tell you how many times I’ve had to take a graph database and turn it into a key-value store.” So when choosing a database, Bengfort doesn’t start with data type. He begins with some basic questions. How is this database going to be used? What is the application going to require of this database? Who is on my team and what skills do they have? “For database optimization, data type is actually not a good indicator of the type of database you should use. These other things are more important to consider.”

Conclusion: Avoiding the Innovator’s Dilemma Until now, organizations have been forced to choose between improving developer productivity now or getting easier value from their data later. But with a converged database, this decision becomes a lot less painful. In the past, to optimize developer productivity, organizations tended to spin up single-purpose databases for specific projects. Each database offered a convenient and easy-to-master data model for that purpose. A simple set of APIs made it easier to start developing against them. But, as we’ve seen, as a project grows and additional single-purpose databases or cloud services are required, you start experiencing data fragmentation. Each database has its own tooling, security methods, and operational characteristics. So by cobbling together these disparate databases, you risk having inconsistent data and security gaps. It’s increasing difficult to use that data to answer businesscritical questions. One approach that organizations have tried is to optimize for using the data rather than easy development. In such cases, they use a corporate standard database, usually a relational database or a relational-based multi-model database. Although this may enforce official policies and simplify the reuse

of data, its limited functionality can hold the business back. You may find it difficult—or even impossible—to innovate. This puts your business at competitive risk. Neither one of these choices is acceptable. How do businesses escape this dilemma? The answer: a converged database. A converged database, as we’ve discussed, is a multi-model, multi-tenant, multi-workload database. It supports the data model and access method each development team wants, without unneeded functionality getting in the way. It provides both the consolidation and isolation these different teams want, but don’t want to think about. And it excels in all the workloads (like online transactional processing [OLTP], analytics, and IoT) these teams require. According to Bengfort, when making database choices, companies need to be thinking about and investing in flexibility for their future innovations. Yes, they need to invest in maintaining and improving legacy apps, but they also need to invest in their teams. “That’s what would make a convergent database system work,” he says, a loyal team. The kind of team that will work for your company for 5 to 10 years, “not just the next 36 months or whatever the average work expectancy is now.” And don’t forget getting support from the business side of the house, reminds Cani. Support that allows you to take appropriate risks. “At the end of the day, IT is there to help the business succeed,” Cani says. But you need the business side to have your back. “I always think about SpaceX. You’re going to have to blow up a few rockets before you land on Mars. Technology is like that as well. If you don’t have the support to blow up some rockets along the way, you will not be able to deliver something that, in the long run, will be your huge differential in the market.” And, finally, there’s collaboration. Reis and Housley stress that in order to create a successful data-driven app, you need developers and data engineers working together to architect a solution. “Typically what happens is the data engineers aren’t brought in until the end,” says Housley. “They’re supposed to fix the mess. But to make a data-driven app that will succeed, people have to be working together—and with the data—from the get go.”

A converged database may not be a magic bullet, said Reis, but it may simplify things enough that hopefully developers and data engineers can think about data from the beginning, and understand where the possible bottlenecks may be: “That’s how they can create a path to developing successful datadriven applications.”

About the Author Alice LaPlante is an award-winning writer, editor, and teacher of writing, both fiction and nonfiction. A Wallace Stegner Fellow and Jones Lecturer at Stanford University, Alice taught creative writing at both Stanford and in San Francisco State’s MFA program for more than 20 years. A New York Times bestselling author, Alice has published four novels and five nonfiction books, as well as edited bestselling books for many other writers of fiction and nonfiction. She regularly consults with Silicon Valley firms such as Google, Salesforce, HP, and Cisco on their content marketing strategies. Alice lives with her family in Palo Alto, California, and Mallorca, Spain.

1. Developing Modern Applications with a Converged Database a. Data-Driven Applications: The New Gold Standard of Enterprise Software i. Barriers to Building Data-Driven Applications b. Single-Purpose Specialty Databases c. Converged Databases i. Know SQL? You’re Set with a Converged Database ii. Converged Databases Address Common DataDriven App Challenges iii. Some Potential Use Cases for Converged Databases d. Moving to Modern Development Paradigms i. Development Paradigm #1: Microservices ii. Development Paradigm #2: Event Streaming iii. Development Paradigm #3: API-Driven iv. Developer Paradigm #4: SaaS v. Developer Paradigm #5: Low Code vi. Developer Paradigm #6: Distributed Data vii. Developer Paradigm #7: Continuous Integration/Continuous Delivery e. Converged or Single-Purpose Database: Which is Right for Your Use Case? f. Conclusion: Avoiding the Innovator’s Dilemma