Many Businesses Aren’t Protecting This Valuable Asset

full frame shot of eye

A 7-Minute Introduction to DataOps

It was a typical business morning for the Technology Consulting Group in early 2000. The Internet was booming, technology was growing in every possible genre, and we were hiring. The secretary saw the new recruit’s email come up on screen. Ten minutes later, all the documents on the network had been overwritten with zeros… the document files were there, but the content was erased.

What first looked like an email containing a resumé was actually a virus. Millions of dollars of research, invoices and contacts were gone. We had backups, of course, but Iomega Jazz drives turned out to be unreliable. Within a few months, the prosperous and growing company closed its doors. All it took was losing its intellectual property – its data.

Fast-Forward to Today

There’s a new trend across many of companies today regarding data, databases and reporting. For the past few years, companies have begun treating data, its structure, and its presentation with the same regard as software managers treat code. This is because businesses have experienced some serious hurt over the past two decades by not applying DataOps principles.

  • It costs small businesses to go bankrupt. e.g. Technology Consulting Group.
  • It costs large businesses billions of dollars in damages; e.g. Samsung, Uber, Progressive.
  • It can even make large businesses go under; e.g. It nearly ended Pixar.

What is DataOps?

DataOps is a workflow process that ensures the quality, reliability, governance and security of data, schemata, queries and reports. It offers easy delivery, quick recovery, new insights, and change transparency. It captures metadata such as the person who made the change, the associate who requested the change, what exactly was changed, tickets associated with the change, and why. It amplifies feedback loops and encourages experimentation, allowing the team to learn from mistakes to achieve mastery.

DataOps takes processes from three other well known workflows in the IT industry: Agile, DevOps and LEAN. The concept has been loosely applied in software engineering businesses since the late 2000’s, but hadn’t been formalized until mid-2021.

The Trinity of DataOps

Agile methodology is a framework for project management that that focuses on broken-down iterations of work called sprints. At the start of each sprint, a set of work items is divided amongst members of the team, or placed on a kanban board. At the end of each sprint, teams reflect on the work performed to find improvements in their strategy for the next sprint.

DevOps is the practice of team-sharing files through a central repository to coordinate and collaborate within and outside the team while communicating with each other through a set of tools. DevOps provides additional tools to consolidate work, build the product, document, unit test, systems test and, if testing is successful, deploy the product.

Lean is used to continually verify the quality of a product and the security methods that protect it. For example, definitions of data can change over time, and not having a system in place to check this allows misleading data to enter the system. Data that at one time meant “A”, now means “B” and should be handled differently. Data that could compromise the company or its clients also needs to be handled in special ways to ensure privacy.

How Does DataOps Differ From DevOps?

There are several factors that are unique to DataOps, and DataOps incorporates many aspects of DevOps within its process, but DataOps is not a superset of DevOps.

FeatureDataOpsDevOps
Sharing work on the same file?Reports are siloed and SQL scripts are functionally atomic. This makes splitting work on the same entity very difficult. Pair programming is required to share development on the same file.Source code is mapped by lines that are easily split between developers. Multiple developers can check-out work on the same file at the same time.
What teams are involved?Business Operations
Data Science
Business Intelligence
Data Governance
Data Management
Data Operations
IT Operations
Compliance
Engineering
IT Operations
Software Development
Quality Assurance
Security
User Experience
Design
Operations
What skills are involved?Data management
Data science
Data analysis
Data integration
Data quality
Data security
Statistics
Reporting
Business
IT operations
Data operations
Application engineering
Data engineering
Data governance
Requirements gathering
Application architecture
Software engineering
Software development
Application integrations
Coding
Testing
Quality control
Quality assurance
Security
IT operations
Continuous Integration
Continuous Delivery
What is the pipeline like?Develop the data product
Manage the data resources (ETL)
Test to ensure quality
Release to users
Manage usage
Monitor usage and results
Design the application / changes
Develop and build the application
Test to ensure quality
Release to users
Monitor usage and error logs
Agile PlanningUsually Kanban based; some work is planned up-front of each sprint, but most work flows through the board as requests come in. Loosely structured; More organic.Usually Scrum based; all work is planned up-front of each sprint. Highly structured, mechanical and organized.
LEANFocuses on source-of-truth and data governance principles while cards are pulled or distributed from the board throughout the Sprint.Focuses on DRY, SOLID coding principles after the Sprint has begun and work has been assigned.

What tools can be used to apply DataOps?

DevOps Source Control

TFS, Subversion, or Git can hold source files like: schemas, type table seeds, utility scripts, views, functions, procedures, configuration files, and certain types of ETL packages and reports. Basically anything that you can load in notepad and read is a good candidate for Git. However, the nature of Tableau and Power BI report files requires a little more tooling.

Power BI recently added source control to its server and it is amazing! It separates the report definition from the other components and uploads the pieces up to a Git repository. You can then compare the XML of the reports’ definitions (and other components) to see what changes took place. It provides a text box for developers to comment on version changes and commit only the reports that apply.

Tableau, however, only keeps the prior 9 copies and ditches the rest; Tableau just has a rolling backup the latest 10 versions. So for Tableau reports, either Git with LFS enabled, or Bitbucket are better options than relying on the server. Either of these options allow commits with comments, versioning, tagging, merging and conflict resolution. Of the two, I would recommend a cloud-managed Git repository system.

For data changes, such as values in type or look-up tables and system data that is handled outside an administration console, such as price changes, put the data change in a script that can be checked-in, reviewed, and vetted to a test environment. Redgate provides some good tools for this such as Flyaway and SQL Data Compare.

Recommendation: Azure DevOps Git for SQL Server & Power BI, GitHub Actions for Oracle & Tableau, Redgate Flyaway and SQL Backup Pro for data

Kanban Board

Jira, Azure DevOps, Monday and Wrike all come highly recommended. Since this is the central location for filing and working off requests and features, it’s best to research which of these would be best suited for you and your team.

Recommendation: Azure DevOps for SQL Server & Power BI, Jira or Monday for Oracle & Tableau

Lean methodology

Lean methodology rests on two pillars that provide a framework for all Lean projects: Continuous improvement and respect for people. It’s more about how to use the tools, people and resources at hand to create a feedback loop that improves process and product for the client and the workers. It can use the tools already mentioned, but adapts for the unique needs of your service. Consider a system that allows you to extend it with plugins and contains workflows and pipelines to automate as much of the development and deployment as possible.

Recommendation: Azure DevOps, Jira, Monday, Jenkins, GitLab … whatever fits your team and client needs best.

Conclusion

With tools and a structured process that involves the whole team you can implement a process that protects the core components and intellectual property of your company. It allows you to efficiently and confidently release changes that effect everyone. If there’s ever a failure because of database, report, or data changes you have a way to roll back quickly. Backups can only go so far, and should be last-ditch efforts to recover from a disaster. DataOps is a methodology that keeps your data and its availability safe and operational while keeping your teams productive.