The observatory
-
June 16, 2022

Caitlin Colgrove, Hex Technologies

Caitlin Colgrove, founder and CTO of Hex Technologies, and Kyle Kirwan, CEO and co-founder of Bigeye, discuss the proliferation of tools along the data pipeline and why teams are gaining the technical chops to work with them.

Data Science
Hex

Read on for a lightly edited version of the transcript.

Kyle: Welcome back to The Observatory. It's Kyle from Bigeye, where we talk with people in and around the data space to see what's going on. Today we're being joined by Caitlin Colgrove, founder of Hex. Caitlyn, welcome to the show.

Caitlin:Thanks so much for having me.

Kyle: Caitlin, maybe what we can start with is pretty obvious. What is Hex? And particularly, what's this focus recently on knowledge instead of just analytics?

Caitlin: Yeah, so that's a really good question. Hex is a platform for collaborative analytics and data science. With it, you can connect to your data, you can analyze it with Python, SQL or R all in one place, and then build, share, and manage beautiful reports and applications on top of that analysis. You can do this end-to-end, all inside of Hex. And for your point about knowledge, at Hex for the very start, we were really maniacally focused on helping analysts derive value at their organizations. The value that they create inside of their company is actually this idea of knowledge. It’s not just an app; an app is not is not useful in and of itself really. What you're looking for is how to know things from data. And so once you have dashboards or reports, they can be lost, they can go out of date, they can break… What you need are tools to actively manage this entire lifecycle of the artifact from exploration, sharing, and updating, all the way through deprecation. And that's how you build this kind of organizational knowledge base. But a lot of tools, I think they just focus on one part of that lifecycle, and not really what it looks like.

Kyle: Wait a minute, are you telling me that at some point, we actually get rid of old analytics?

Caitlin: Yeah, we actually had a product idea that was automatically archiving things that are more than three months old, but that one did not make the cut.

Kyle: I don't know, If it hasn't been looked at, I might be in favor of that feature! I'd reconsider that on your roadmap personally. There are several different concepts here that are being rolled into one platform. I know that there's a huge conversation going on right now in general in the data space around bundling and unbundling. And that's top of mind for a lot of people. So Hex is obviously fairly heavily notebook-inspired. But I also saw notions about lineage in there. There are also some things around discovery. How do you think about the bundling and unbundling cycle? What's going on right now with the explosion of tools being marketed to data teams? And how does Hex fit into that?

Caitlin: Yeah. First of all, I think this explosion of tools has really been nothing but good for the industry. Overall, there are so many good ideas out there these days, and people are finally getting the chance to go out and build them and try them out. But it absolutely comes at a cost because managing a lot of tools is really hard. At Hex today, a lot of what we replaced are actually systems, homegrown systems with a bunch of different tools that are stitched together in ways that they were never really meant to be. So I do think you will see some consolidation, particularly around users and workflows. But that doesn't necessarily mean you end up with this one-size-fits-all megadata platform situation. Because when you do that, you do lose a lot of the ability to support certain workloads really well, and you end up with a “it does everything just okay” solution. That’s fine for some teams, but always leaves open opportunities for best-in-class products that target specific groups or workflows. I think you see this in a lot of industries, not just data. We use Linear, for example, internally, it's taking on at last hand and doing one thing very well. And I think, in practice, this is actually how a lot of innovation happens. And I don't see that part slowing down.

Kyle: You raise a great point about how solutions that are one-size-fits-all cover a lot of bases, but maybe aren’t as deep, versus what you would call best-of-breed products. A common topic that I've talked about with other folks on the show in the past has been the toolsets and the processes and the way that we work in data changing as the company evolves. So maybe you're the first data person at a company of 25 or 30 or 50 people. The processes that you go through and the way that you work is pretty different from a company of 500 or 5,000. Do you view that preference for one-size-fits-all versus best-of-breed? Do you see that as a function of maturity? Or does it depend on other factors?

Caitlin: I think adoption happens very differently at a small company versus a large company. And so I do think small companies have a lot more of an opportunity to pick something that's a little bit newer, but is maybe better at the thing that they're trying to accomplish. Because if you have a team of a few data scientists, there’s a lot less overhead to go and sign up for a new tool. But I do think as the modern data stack has matured, we at Hex have seen adoption of these different tools move more and more upmarket. They have to be a little bit more established, they have to have all of their Enterprise features and security and all of that stuff in place and be able to do the company-wide top-down sales. I think it's harder to do a mix-and-match thing at a larger scale. But we are actually seeing that as these products mature, they move upmarket as well.

Kyle: Super interesting. I know that when I was on the Data Platform team at Uber, this was always the question. We could build it in house, and then it's woven into the other set of tools we have. Or we can go buy that vendor solution. And maybe it's more mature than we're gonna get over the next six months or 12 months. But that comes at the cost of having to integrate it into all the other points where that workflow might have a touchpoint. So we’ve seen that before, in practice.

Caitlin: I actually think you see a difference in generations of companies, it's really interesting. I do think companies like Uber and Airbnb, and a lot of these folks, Facebook, built out a lot of the data infrastructure before there was anything off the shelf. But the next generation of companies that are that size will have had a lot of these tools from the very beginning. So I do think you'll see a lot of different and more modern data stack type setups, you know, three to five years down the line, once they have time to grow to that scale.

Kyle: That makes a lot of sense, Fairly related, I wanted to ask: it seems like the lines between what you would call traditional analytics -  I'm going to do some analysis, I'm going to build a dashboard, there's going to be a report, etc., and these constantly online, living data products, where maybe it's a dashboard that's updating itself every five minutes, or maybe it's even interactive. The lines between those things seem to be blurring a little bit. As that line blurs, I imagine the role of a data scientist or an analyst is changing with it. What's your perspective on how that role is changing as the tools and the way we expect data to work is changing?

Caitlin: To start, I don't think “data scientist” is really a single role anymore, or I don't know if it ever was. This is something we learned pretty early at Hex, actually. There's a pretty big difference between someone who spends most of their time doing exploratory analysis, versus someone who does most of their work in production, machine learning models, etc. And we're honestly really excited about this trend at Hex, because we made a bet on this, that there would be a broad kind of variety of workflows. And we're building out this tool that supports a broad band of workflows, but also a pretty big spectrum of technicality, which is another thing that we're seeing in this space. We talk about it as a low floor, high ceiling. It’s easy to get into, but then basically unlimited in what you can actually accomplish. And I do think you're seeing that over time, that analyst role in particular, a subset of what would have traditionally been called a data scientist role, gets more and more technical over time. You're seeing a lot more SQL and Python and code-driven workflows. You're seeing software engineering best practices, like testing and monitoring, coming into these workflows. And this is where you do get a lot of that demand for more powerful data products from the analytics side. So really true data applications, not just dashboards or decks or spreadsheets. And before, I think that might have been confined to people on the software engineering side, but now you're seeing that come into the analytics space as people develop the skillsets that empower that.

Kyle: Totally, I agree. I see a ton of movement in general amongst folks in data. If you go back to the roots, back to 2012 or 2013, a lot of the folks that were the early people on data teams were ex-physicists, or people from econometrics or things like that. That software engineering skillset was softer. And I definitely see a change in that over time. There’s a lot of software engineering skills coming directly into the data team. And of course no conversation about the space would be complete without bringing up dbt. I think they're one of the great examples of the slow march of software engineering processes and tools into the data space. And I think that's a good trend. So Caitlin, what else do you see changing the most or most dramatically for analytics and data science teams broadly over say, the next three to five years?

Caitlin: One of the things that we think a lot about here at Hex is how to increasingly expand the circle of people who can work with data. A lot of people talk about this, but often I see it as, for lack of a better word, “We’re going to dumb down the tooling so it’s easier to learn and more friendly and approachable.”  At Hex we know that people can code and the hardest thing about coding is actually not the coding itself. It's everything around it. How do you install it? And how do you run it? And how do you share it and collaborate on it? One of the things that we see at Hex, one of the biggest changes over time, is not just that more and more people are working with data. But more and more people are able to come into tools like Hex, and not just ask and answer questions of their data, but also interact with it at a much higher level of technicality and a higher level of power. And so what I think you'll really see is not just more and more people looking at data and doing point-and-click things, but actually really starting to acquire these these technical skills, whether it's learning a little bit of SQL over time, and then being able to apply them and actually drive value in the organizations with tools like Hex.

Kyle: So you’re seeing expanded access to data, from very small targets that grow into the whole organization over time, and the level of skill that individuals are comfortable with interacting with data, right?

Caitlin: Yup! Kyle: Well, Caitlin, we can't let anybody off the show without three quick rapidfire questions. Easy first one: are you on team R or team Python?

Caitlin: Definitely, definitely Python.

Kyle: Number two, what's one thing that people don't understand about notebooks that you wish that they did?

Caitlin: I wish more people understood that the things that they think they dislike about notebooks aren't really inherent to notebooks. They're inherent to specific implementations of notebooks. I think notebooks themselves are a great tool and a great user experience. But I think there's a lot more work in innovation to be done until they've worked out all of the kinks.

Kyle: Got it. This goes back to your comment about the codes, not the hard part. It's exactly makes it possible. Cool. Okay. And last one, would you rather fight one giant burrito? Or 100 tiny tacos?

Caitlin: Am I allowed to ask how big the burrito is?

Kyle: It's big. It's really big.

Caitlin: It's really big. Um, I think probably still the burrito. I kind of feel like it would be big and slow and easy to avoid. Getting swarmed by a bunch of tiny tacos seems hard, too hard to deal with.

Kyle: I feel like there's a lesson about startup roadmaps here about tackling one big thing instead of 100 small things, but I'll leave that for another time. All right, Caitlin. Thanks so much for being on the show. Really awesome. To learn more about Hex and about the way that notebooks and data science are evolving in general, there's going to be a link to their site in the description down below. Caitlin, thanks so much for being on the show today.

Caitlin: Yeah, thanks so much for having me. This was really fun. I'll see you next time.

Get started on your data reliability engineering journey with Bigeye. Request a demo here.

share this episode