In the 1990’s the internet opened a new means to distribute software and computer applications. People joked about it and called it a toy… but before that, you got software through dialing one computer to another, or from a friend who shared a disk. In 1996 we had a new way that would globally automate tasks and distribute functionality to the world!
Fast-Forward 25 years and we have GPT. In 2021, people joked about it, but I saw it as another banner waving to the future… even so, I didn’t expect it to take off the way it has.
Just like the 1990’s was about distributing computer-based automation and functionality, the 2010’s has been about distributing computer-based information and personality.
For years it was thought that the AI models we had been training only needed a decent (i.e. a few thousand) training sets and that was as good as it would get. OpenAI showed that there really is a point where an enormous amount of trained data can cause behavioral change with the same code – though I’m sure they’ve tweaked their model quite a bit.
I heard recently that OpenAI has hit a point of diminishing returns.
From AI engineer, entrepreneur and author Gary Marcus, to Axios on Sam Altman’s information scale approach, we are getting the message that large data has its limits as the availability of untrained human generated art and information diminishes. In short, the GPT model has consumed almost all the reliable information out there.
So what does that mean for the years 2025 through 2029?
“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again,” Sutskever told Reuters.
This goes back to a visit to MCC in 1988 when I was introduced to the CycL program. Engineers of that project told me that the future of AI is not in the hands of statisticians, programmers or even computer scientists. It is in the hands of painters, sculptors, poets, musicians, psychologists and doctors. The algorithms need to be taught expression and human connection. Otherwise they cannot break barriers that are inherent in soulless data.
Now is when the real magic happens. Algorithms must change to include non-standard thinking practices, universal morality, social congeniality, self-expression, and human connection to move to its next stage.
I’m not referring to node-to-node programming that AI Song Bots derive to play one note after another, but for the AI to sense direction and movement of the notes on its own by its own experimentation and experience.
The future of AI is not in the hands of statisticians, programmers or even computer scientists. It is in the hands of painters, sculptors, poets, musicians, psychologists and doctors.
In a post-covid world where being “social” implies sitting alone in front of a computer instead of hanging out at the mall with air-breathing friends, humans are starving for companionship. There are so many messed up and broken forces at play that keep guys and girls from bonding in meaningful and enriching relationships, and one of these is how people have flocked to GPT to fill that void.
I’m predicting that this starvation for meaningful social interaction will be the driving force that moves AI forward in the next 3 1/2 years as we make them more creative.
The question remains: what type of creation will come out on the other side?
Will it be a man-of-a-machine or a machine-of-a-man? Perhaps the answer is both as people are becoming more mechanized and separated and machines become more human and connected. Hopefully, as we venture into training the new AI brain we’ll find a way to meld the two and find more humanity and connection in our selves.
It was a typical business morning for the Technology Consulting Group in early 2000. The Internet was booming, technology was growing in every possible genre, and we were hiring. The secretary saw the new recruit’s email come up on screen. Ten minutes later, all the documents on the network had been overwritten with zeros… the document files were there, but the content was erased.
What first looked like an email containing a resumé was actually a virus. Millions of dollars of research, invoices and contacts were gone. We had backups, of course, but Iomega Jazz drives turned out to be unreliable. Within a few months, the prosperous and growing company closed its doors. All it took was losing its intellectual property – its data.
Fast-Forward to Today
There’s a new trend across many of companies today regarding data, databases and reporting. For the past few years, companies have begun treating data, its structure, and its presentation with the same regard as software managers treat code. This is because businesses have experienced some serious hurt over the past two decades by not applying DataOps principles.
It costs small businesses to go bankrupt. e.g. Technology Consulting Group.
It costs large businesses billions of dollars in damages; e.g. Samsung, Uber, Progressive.
It can even make large businesses go under; e.g. It nearly ended Pixar.
What is DataOps?
DataOps is a workflow process that ensures the quality, reliability, governance and security of data, schemata, queries and reports. It offers easy delivery, quick recovery, new insights, and change transparency. It captures metadata such as the person who made the change, the associate who requested the change, what exactly was changed, tickets associated with the change, and why. It amplifies feedback loops and encourages experimentation, allowing the team to learn from mistakes to achieve mastery.
DataOps takes processes from three other well known workflows in the IT industry: Agile, DevOps and LEAN. The concept has been loosely applied in software engineering businesses since the late 2000’s, but hadn’t been formalized until mid-2021.
The Trinity of DataOps
Agile methodology is a framework for project management that that focuses on broken-down iterations of work called sprints. At the start of each sprint, a set of work items is divided amongst members of the team, or placed on a kanban board. At the end of each sprint, teams reflect on the work performed to find improvements in their strategy for the next sprint.
DevOps is the practice of team-sharing files through a central repository to coordinate and collaborate within and outside the team while communicating with each other through a set of tools. DevOps provides additional tools to consolidate work, build the product, document, unit test, systems test and, if testing is successful, deploy the product.
Lean is used to continually verify the quality of a product and the security methods that protect it. For example, definitions of data can change over time, and not having a system in place to check this allows misleading data to enter the system. Data that at one time meant “A”, now means “B” and should be handled differently. Data that could compromise the company or its clients also needs to be handled in special ways to ensure privacy.
How Does DataOps Differ From DevOps?
There are several factors that are unique to DataOps, and DataOps incorporates many aspects of DevOps within its process, but DataOps is not a superset of DevOps.
Feature
DataOps
DevOps
Sharing work on the same file?
Reports are siloed and SQL scripts are functionally atomic. This makes splitting work on the same entity very difficult. Pair programming is required to share development on the same file.
Source code is mapped by lines that are easily split between developers. Multiple developers can check-out work on the same file at the same time.
What teams are involved?
Business Operations Data Science Business Intelligence Data Governance Data Management Data Operations IT Operations Compliance
Engineering IT Operations Software Development Quality Assurance Security User Experience Design Operations
What skills are involved?
Data management Data science Data analysis Data integration Data quality Data security Statistics Reporting Business IT operations Data operations Application engineering Data engineering Data governance
Requirements gathering Application architecture Software engineering Software development Application integrations Coding Testing Quality control Quality assurance Security IT operations Continuous Integration Continuous Delivery
What is the pipeline like?
Develop the data product Manage the data resources (ETL) Test to ensure quality Release to users Manage usage Monitor usage and results
Design the application / changes Develop and build the application Test to ensure quality Release to users Monitor usage and error logs
Agile Planning
Usually Kanban based; some work is planned up-front of each sprint, but most work flows through the board as requests come in. Loosely structured; More organic.
Usually Scrum based; all work is planned up-front of each sprint. Highly structured, mechanical and organized.
LEAN
Focuses on source-of-truth and data governance principles while cards are pulled or distributed from the board throughout the Sprint.
Focuses on DRY, SOLID coding principles after the Sprint has begun and work has been assigned.
What tools can be used to apply DataOps?
DevOps Source Control
TFS, Subversion, or Git can hold source files like: schemas, type table seeds, utility scripts, views, functions, procedures, configuration files, and certain types of ETL packages and reports. Basically anything that you can load in notepad and read is a good candidate for Git. However, the nature of Tableau and Power BI report files requires a little more tooling.
Power BI recently added source control to its server and it is amazing! It separates the report definition from the other components and uploads the pieces up to a Git repository. You can then compare the XML of the reports’ definitions (and other components) to see what changes took place. It provides a text box for developers to comment on version changes and commit only the reports that apply.
Tableau, however, only keeps the prior 9 copies and ditches the rest; Tableau just has a rolling backup the latest 10 versions. So for Tableau reports, either Git with LFS enabled, or Bitbucket are better options than relying on the server. Either of these options allow commits with comments, versioning, tagging, merging and conflict resolution. Of the two, I would recommend a cloud-managed Git repository system.
For data changes, such as values in type or look-up tables and system data that is handled outside an administration console, such as price changes, put the data change in a script that can be checked-in, reviewed, and vetted to a test environment. Redgate provides some good tools for this such as Flyaway and SQL Data Compare.
Jira, Azure DevOps, Monday and Wrike all come highly recommended. Since this is the central location for filing and working off requests and features, it’s best to research which of these would be best suited for you and your team.
Recommendation: Azure DevOps for SQL Server & Power BI, Jira or Monday for Oracle & Tableau
Lean methodology
Lean methodology rests on two pillars that provide a framework for all Lean projects: Continuous improvement and respect for people. It’s more about how to use the tools, people and resources at hand to create a feedback loop that improves process and product for the client and the workers. It can use the tools already mentioned, but adapts for the unique needs of your service. Consider a system that allows you to extend it with plugins and contains workflows and pipelines to automate as much of the development and deployment as possible.
With tools and a structured process that involves the whole team you can implement a process that protects the core components and intellectual property of your company. It allows you to efficiently and confidently release changes that effect everyone. If there’s ever a failure because of database, report, or data changes you have a way to roll back quickly. Backups can only go so far, and should be last-ditch efforts to recover from a disaster. DataOps is a methodology that keeps your data and its availability safe and operational while keeping your teams productive.
Encountered x file(s) that should have been pointers, but weren’t
We encountered this issue when trying to merge master after someone had committed a slew of PDFs. Although we couldn’t identify the exact situation that caused this problem, the result is that nobody could merge master into their branches.
But amongst the threaded comments, it looks like there is a root issue with the file types that are used to compare files and that on strange and rare occasions a unicorn fart fills the git-void with pain by changing these on the server.
After several dozen attempts, I came across this gem of a command:
$ git lfs migrate import --everything --include='*.pdf'
migrate: override changes in your working copy? [Y/n] Y
migrate: changes in your working copy will be overridden ...
migrate: Sorting commits: ..., done
migrate: Rewriting commits: 100% (5940/5940), done
migrate: Updating refs: ..., done
migrate: checkout: ..., done
The pain was almost over. Let’s see how merge does, now.
$ git merge --ff-only
fatal: Not possible to fast-forward, aborting.
Hmmm… Okay. Let’s not force a fast-forward merge; let’s just do a regular merge.
Then I performed a manual merge on all the files by taking the server’s file (on the right in Sublime Merge, when resolving conflicts). But Sublime Merge couldn’t actually commit the non-changes, so I went back to the git console to wrap up tackling the lfs-merge nightmare (like a Dream Warrior to Krueger):
You must be logged in to post a comment.