Former Director, Product Marketing at Confluent Inc
- Operating environment for Confluent amid the shift towards Apache Kafka
- Assessment of the Confluent platform and data-streaming capabilities
- Decelerating growth implications and trajectory with new product launches
- Outlook post-IPO and comparison to Cloudera's (NYSE: CLDR) path
SP (Specialist): As you alluded to, Confluent was based, it was a company built on top of Apache Kafka, and Apache Kafka is an open-source technology that was invented back, it started 2011, actually, by the founders of Confluent today as a way to unify and streamline the way that disparate data from various sources of data and consumers of data can be integrated into a central pipeline. Essentially, this was back in the days when the founders were employees at LinkedIn, they were trying to unify their various databases, various data sources like Aptar, the mobile application, website applications, internal and external data sources and bring them together in a way that their internal analytics teams, as well as application development teams, can use them to enhance the user experience and derive greater business value. Since then, Apache Kafka has basically evolved to be one of the core pillars of many modern application architectures, in the sense that it enables in a very high- throughput, high-performance, highly reliable way of streaming real-time event data.
Since then, in 2014, Confluent was started as a company to commercialise that open-source technology. What Confluent has provided since then is a way to really provide enterprise-grade features and functionality and also services and support for companies looking to use Apache Kafka in a true production-ready, enterprise- grade environment. Confluent has developed two separate main product lines to enable that customer base. One is called Confluent Platform, which is essentially a licensed software product, so they’re basically providing a set of unique proprietary features that Confluent had built on top of Apache Kafka, as well as professional services and expert support on top of that. The second product line is called Confluent Cloud, which is a fully managed SaaS product, essentially a cloud version of Apache Kafka. In addition to that, again, many unique and distinguished features and capabilities that are not available in the open-source product are made available there for paying customers, things around security, reliability, uptime SLAs, things like that, which we can get into a bit more later.
SP: I can give you a few different examples. Essentially, Kafka is super useful any time you have a use case where you need real-time data that is high-throughput, that requires low latency and requires high availability. One good example of this might be fraud detection in a banking application. If you think about any kind of modern credit card or debit card or banking application that you might have on your phone today, you can imagine, let’s say you lose your credit card or even you’re using your own credit card in a new place. Let’s say you’re travelling and you go buy something at the store and then within 30 seconds on your phone you might get a pop-up notification from your credit card application saying, “Hey, you have an unrecognised transaction. Can you confirm whether this is yours or not?” Applications like that are often powered by Kafka, and the reason that’s a good use case is that it requires both, obviously, real-time data, event data, but it requires the aggregation and processing of data from various sources. In this case, in this example I just gave you, it requires data from their transaction database which is coming in in real time. It has to analyse that against a historical usage pattern and your profile data based on your historical usage and what they know about you as a consumer, and it has to be able to send that message to your application, your endpoint, either as a text message or a notification in your application.
Kafka and, in many cases, Confluent, is the backbone, what we used to call the central nervous system, behind all of that processing and all of that data flow. When we talk about data in motion, what we really mean is that there are pieces of that equation of that use case where data is constantly changing. Let’s say as the holder of that credit card, geographically where you are, what transactions you’re making would be two of the main things there. That’s one pretty common use case. Another common one in another different context would be something like in retail, nowadays, omnichannel retail is super critical and every retailer wants to build a great experience with omnichannel. One of the key things you have to do well with omnichannel is have real-time inventory management. It would be very hard to do omnichannel if someone is trying to, let’s say, buy something through the retailer’s website and then, as it turns out, let’s say, someone in the store is also looking at the exact same item that the person on the website is trying to buy and so in the time that the person on the website is transacting and trying to close that sale, that item is actually no longer in existence, someone already bought it in the physical store. This used to happen a lot in the old days, basically, you could buy something online and then a day later, two days later, the warehouse would send, basically, a notification saying, “Sorry, this item is actually no longer in stock, we have to refund you your money.” That’s obviously not a good experience for the customer and that’s lost sales for the company.
With Kafka, what you can do is truly make that experience real time, you can actually have a constant pulse on your inventory, your customer engagement at various touchpoints in that retail journey, whether it’s in-store or online. Therefore, in that same example, with something like Confluent on the back end, the person online trying to buy the item, they may get a notification right there before they’re about to transact saying, “Sorry, actually, someone at the store is about to check out on this so please check back later,” or, “It’s no longer in stock.” Although it may not still be the ideal outcome for the retailer since they’re out of stock but it at least gives a better experience for the buyer so they’re not disappointed two days later when they actually think they got it. There are a couple of examples. There are a few more I could give you but hopefully that gives you a good sense of where Kafka and Confluent fit in.
SP: I think just to answer the first part of the question which is what’s driving that growth, there are three main things, I would say. One is just the overall acceleration of cloud adoption and hybrid infrastructure adoption. We’ve seen this trend accelerate over the past, I would say, 5-10 years, and you see that with the numbers AWS shows. Essentially, cloud adoption is forcing people to rethink, obviously, how they think about data, how they use their data and where the data is coming from not where the data is going. That’s one thing that’s causing people to rethink their architecture and driving some people to Kafka. The second thing is that, as I mentioned, real-time applications are here to stay and only here to grow. I don’t think there’s any world in which people will say, “Actually, no, we didn’t really like this real-time stuff, let’s go back to daily updates or delayed updates.” As things get more and more into the realm of real-time experiences, technologies like Kafka will become increasingly important. The third thing is tied to the first point around cloud adoption, which is the modern approach of developing applications based on microservices and new application architectures, which require, again, this coordination of data between different parts of the application, different parts of the organisation as well. These are really driving solutions that help orchestrate the transfer of data across various producers and consumers.
The second point about what other competing technologies might there be, and you mentioned Pulsar, it’s interesting, I think right now, Kafka is the biggest player in this space, and just in terms of underlying technology, this is not even really necessarily specific to Confluent, but Kafka itself, it’s one of the hottest infrastructure technologies out there, and for good reason. It’s been proven out over the last almost 10 years now in terms of its usefulness. Certainly, as we know historically with other database technologies, there’s always going to be something new coming up in the open-source community all the time and Pulsar is one of these things. I think the challenge with any open-source project is getting real production-level traction. Right now, that focus has really been on Kafka. The stat is that 70% of Fortune 500 companies are using Kafka in one way or another. Many of the top 10 of any industry you can think of, top 10 players in any industry, are likely using Kafka in their production environment. Whether that’s Kafka with an open-source distribution or whether that’s through Confluent or another provider, that’s to be seen how that plays out but definitely, Kafka, I think, is rooted in many companies which have forward-looking data infrastructure. I think it’ll be very hard to displace Kafka at this point as a core technology, at least, it’s not going away.
SP: I would think about the target market in two distinct groups. One, I would say, as you’re describing, is more on this legacy, companies that are established players that have been around for a very long time that have existing IT stacks and, potentially, a lot of people that support that stack, and who are now transitioning to modern architectures, that’s one category. The other category, I would say, are companies that are what I like to call cloud-native, companies that have started up in the past five years, maybe up to 10 years, who really built their entire company and entire business based on software and based on cloud-driven technologies. I think those are the two distinct groups. You make a good point that in the first group, in terms of the legacy companies, it might be hard to fully access that market because they have a lot of incumbent technologies, a lot of existing relationships with other vendors and so on, and I think that’s right, I think that’s absolutely correct and that is a challenge that any software product faces, any non-IBM, non-Oracle, non-SAP company would face. Those companies have multi-million-dollar, tens or hundreds of million-dollar contracts with some of these large enterprises where they’re basically providing everything under the sun. I think there’s certainly a cohort of companies there that are going to be resistant to any new technologies just in general. However, I think that’s not true with every single company there and especially that’s not true with every team in those companies. I think the approach there really is to find the specific use case and specific team that has a need for new technologies, such as Kafka, where we can really help differentiate that.
Take a company like Walmart, Walmart is obviously a very established company with a very diverse IT stack. At the same time, they have their whole e-commerce business that is run almost as a separate division from their main retail operations and they’re going to be looking at all the newest cloud technologies and they’re working with the cloud providers, such as Microsoft Azure or Google Cloud Platform, GCP, to look at what are the newest up-and-coming technologies that are going to help them get an edge in the e-commerce space, let’s say. From there, they may take that foundation and apply it to other parts of the organisation, maybe they would apply that to their overall inventory management system, maybe they apply that to the overall employee management system and so on. I would say that’s more the approach in terms of legacy companies. I don’t think there’s a realistic sales play where Confluent or any new technology company, I would say, comes in and does a rip-and-replace on the entire IT infrastructure in any of those companies, it has to be starting small, prove the use case and then grow from there. I think that’s the first target segment. The other segment which I mentioned was around cloud-native companies, and these are companies that are familiar with cloud technologies and they probably already are aware of Kafka, they probably are already using Kafka to a large degree. In that case, Confluent can provide a lot of value in terms of offloading a lot of the operational burden, providing more reliability, security features, actually, also providing a lot more functional features to save time and effort and focus for those IT teams and the DevOps capabilities there where they could be spending that effort and time elsewhere in a more productive way.
SP: I actually haven’t seen that as a specific area of focus but it’s a good question and it may become in the future, but I haven’t seen that just yet. I think the state of the company right now is just so much latent demand elsewhere where I think that’s more of a niche play. I haven’t seen many companies take that approach just in general.
SP: I think directionally that sounds about right, in the sense that IT spend in data infrastructure in real-time event streaming technology and event-based data architecture, I think those things are definitely accelerating, as I mentioned, due to some of the macro drivers that I talked about earlier, in terms of cloud adoptions, microservices, real-time applications. I think in terms of the specific number, yes, like you said, I can’t really comment on how accurate that may be, but I would say that the way I would think about it is that in today’s world, that USD 50bn, not all of that necessarily will be direct vendor spend, a lot of that may be infrastructure spend to support these data architectures, some of that may be services spend, some of that may be just employee costs. I think, moving forward, as especially Confluent as a technology gets more and more mature and the features become richer and richer, I think a lot of that spending that goes to various other types of vendors, like, for example, infrastructure, may come back into a managed service like Confluent.
SP: I would say there are two types of competitors as you think about the customer considerations. In one case, the customer is already familiar with Kafka, they’ve decided to use Kafka and now they just need to decide how they’re going to implement Kafka. In that case, their competition is really is the customer going to do it themselves in-house using open-source Kafka? Are they going to buy, work with Confluent through a paid enterprise subscription? Or are they going to go with any of the hosted Kafka services, like MSK or Aiven? That’s one bucket of competition. The other bucket of competition is with not necessarily traditional but non- Kafka streaming services. A lot of the cloud partners have their own version of this, AWS has Kinesis, you have Google Pub/Sub, you have Azure Event Hubs and some others. The customer has to decide at that time which of those are the right solution for them or should they actually consider Kafka as their long-term solution. Then, I would say the third bucket of competition is perhaps against some of the truly legacy players, Cloudera, Red Hat and so on. With each of these, I think the dynamics are quite different. In terms of pure Kafka competition, if you think about whether someone wants to implement Kafka with in-house, open-source or go with MSK or go with Aiven compared to Confluent, I think in this case, Confluent is by far leading the pack there in terms of functionality, capabilities, expertise, support, security and overall cost-effectiveness in terms of total cost of ownership. Usually, people are often trying some of these other alternatives because they often look cheaper, like in open-source, and some people think open-source is free, but in reality, I think we all know that open-source is not free because of all the in-house costs you have to dedicate to hire, operational costs, employee costs, all of these things to run it, so it’s not actually free, but people think it’s free. You have other low-cost alternatives like Aiven and MSK.
I think what we see at Confluent is a lot of people definitely try to do these things first and try to go with the low-cost route and then they run it for 3-6 months and they find out, “Hey, actually, this is a lot harder than we thought it would be, we’re actually ending up dedicating a whole team to keep this thing running, making it reliable, making it secure.” Then they start thinking that, actually, now we need a real enterprise partner to help bring this to production, and that’s where Confluent truly, truly wins in at a tremendous high rate to win over those customers who felt the pain in those Kafka alternatives. That’s the first one. In terms of the cloud provider alternatives to messaging services like, as I mentioned, Kinesis, Pub/Sub, Event Hubs, these are actually fundamentally different technologies compared to Kafka and there are several different limitations here. (1) Is that they are not nearly as scalable or performant as Kafka, and (2), they are proprietary protocol, they’re locked into that specific cloud provider and so for customers that are looking for multi-cloud or hybrid cloud, as many do, or have decided that on Kafka as underlying technology, these alternatives just won’t work. Certainly, there are cases when customers may consciously choose one of these, maybe their use case is fairly limited, maybe they’re not streaming tons of data or they don’t have a real-time application need, that sometimes happens, then they may choose one of these cloud provider alternatives, which is fine. I think in those cases, they may be there for a while and over time, as their needs progress, they may eventually look to Kafka again and look to Confluent again for bigger use cases and more mission-critical use cases.
Then the third category, some of these legacy players, I think Confluent would face the same challenges there as any other new entrant in a similar market. As I mentioned earlier, a lot of customers who work with the Oracles of the world, the IBMs of the world, they have multi-million-dollar, multi-tens or hundreds of million- dollar contracts with them where they have to spend a certain amount and basically buy everything off their menu. When that’s the case, it’s obviously going to be difficult for the procurement team at that potential customer or their IT team to go and recommend a whole new technology where they’re going to have to spend a ton more money and so on. I would say those are companies that, as I mentioned, may present some challenge but I think over time, Kafka and Confluent would still beat them out solely based on the fact that they need the technology. In fact, if you go on any of these websites, for example, if you go to the Cloudera website or the Red Hat website, they actually talk about how their technologies are compatible with Kafka, and I think that should give you a sense as to how important Kafka has become in this space. I think over time, the IT teams and procurement teams and CTOs of these companies will realise that some of these all-in-one players are not going to be good enough for their needs and they’ll have to look for best-in-class solutions and they’ll likely look to Confluent when that time comes.
SP: The closest comparable offering. In what sense do you mean? I think it’s hard to compare because it’s open-source. I think technically anyone could download Confluent. In fact, Confluent platform is a community licensed product so anyone could actually download it for free and run it themselves. They won’t get all the tech features but they’ll get some of the core Kafka-enhancing features. We actually often see self-managed open-source projects as typically the biggest barrier but, again, those customers tend to come back a bit later when they realise they need more help and service.
SP: I don’t think so. I think Kafka is one of these projects where everyone knows about it and any DevOps or IT person has at least heard of Kafka. I think the challenge is that, very honestly, Kafka is not easy to implement, especially for a production environment. It’s a highly distributed architecture, it requires a lot of properly sizing and configuration of, essentially, your clusters, it requires some certain sizing of your network capacity, storage capacity, there are a lot of knobs to turn to get it to work the way you want, and then on top of that, once you have it running, it requires a lot of attention to make sure that it’s running optimally and you don’t have delays and you don’t have disruptions in your data pipeline. What that means is that, number one, for any company, it’s actually really hard to hire Kafka experts because there are just not that many people who have that much experience with Kafka, and it’s especially hard for non-tech companies to hire Kafka experts. I’m based in the Bay area and if you’re a company like Netflix or Uber or Google, these companies are known for having powerful, capable engineering R&D teams. I think there are a lot of legacy companies that struggle to hire the quantity and depth of talent experience at the scale that they need to run every single technology system that they have. Kafka is one of these very niche technologies where just not every company can find the right people to run it. That’s the first challenge. I think the second challenge there is that once they get it up and running, it’s just very hard to keep it running on your own, you have to have a dedicated team looking at it all the time and so that becomes very expensive and it takes away from the focus that these teams, the IT and DevOps teams, have on developing real solutions that are core to the business.
What Confluent really is offering is the ability to physically offload that need for expertise that a company has. Confluent itself, I don’t think there’s any greater concentration of cost experts outside of the virtual walls of Confluent, both in terms of current headcount as well as collective hours of work contributing to and maintaining Apache Kafka, the open-source product nor is there any company that has really devoted the amount of engineering resources to develop the additional capabilities on top of it. What you have is, basically, if you’re head of IT, head of engineering and you want to deploy Apache Kafka, you can spend 12-18 months, hire out a team, build out the infrastructure, play with the sizing of the clusters, play with the configurations, play with the throughput, maybe get it up and running and then have to continue to build out that team to continue to maintain that infrastructure indefinitely. Or, you can look at a vendor like Confluent and within 3- 6 months be fully up and running and have maybe one person managing that dashboard moving forward, and then that infrastructure can scale elastically and in an unbounded way as the use case expands throughout the organisation and as the data needs growth for that single use case. I think it’s just a pretty strong value prop for anyone who’s serious about implementing Kafka in a production environment.
SP: I think the connectors are, it was not an overnight effort that has led to this catalogue of connectors, this took years to build out this catalogue. I would say this is only one component of what distinguishes Confluent’s product from other potential competitors. One key thing is actually, we haven’t got too deep into this yet, but the Confluent Cloud product is actually a very unique offering that no one else can do right now and it’s actually been a huge challenge to implement within the walls of Confluent. I think it would be, in my mind, unrealistic that some new competitor could overnight come and develop something better than this in any reasonable amount of time. Essentially, the engineering team at Confluent has essentially re-architected Kafka for the cloud for that cloud product, it requires a completely different way to implement Kafka in cloud-based resources to make it completely elastic, completely secure, completely dynamic and allow customers to use it on a usage base model vs paying for infrastructure. The difference is when you think about other hosted Kafka services, actually, all they’re doing is they’re asking the customer to size up what do you think their throughput is going to be, what do you think their data retention is going to be and then all you’re doing is you’re buying a compute cluster, some storage capacity and then they’re basically running open-source Kafka on it, with some bells and whistles but, essentially, that’s it. If the customer wants to scale up their Kafka deployment or scale down their Kafka, it’s actually quite hard, they have to redeploy a different cluster, transfer the things over and do transition planning and things like that. With Confluent Cloud, all that stuff is completely eliminated from the operational burden of the customer, all you do is you sign up and it works, there’s no real sizing involved, there’s no configuration involved, certainly, you have to configure it to your network and your infrastructure but it removes a lot of the headache of planning upfront for what kind of infrastructure do you want to deploy Kafka on. That’s a very unique value prop that Confluent has that no one else has and it’s not easy to do.
SP: It’s a good question. I think, like you said, the biggest advantage that Kafka has on Pulsar right now is just that there’s such a huge amount of momentum and a huge, I guess, awareness of Kafka as a predominant technology for certain use cases that has led to just tremendous organic adoption of it. I think Kafka didn’t happen overnight and it didn’t happen just because Confluent said, “Hey, buy Kafka,” it happened because developers and deeply technical folks found unique value in what Kafka was offering and, basically, decided to learn about it and decided to use it, and that’s happened, as I mentioned, over the past 6-7 years. Right now, the number of people, number of users, the general community size of Kafka is anywhere, I don’t know, probably 10-20 times the size of Pulsar, so that’s a pretty big gap to catch up on. I won’t say definitively that it can’t happen, certainly, over time, things could potentially change, but there are also some technological differences that make Kafka more attractive in many use cases. I think I probably am slightly out of my depth in terms of the underlying details of the technical differences but I would just say that for most mission- critical, real-time use cases, Kafka seems to be the winning technology, at least as of right now.
SP: That’s a tough question for me to answer. As I mentioned, I think these things take a long time to gain traction and once there is a consensus leader in this space, there is certainly a network effect in terms of continued adoption, and right now, as I mentioned, Kafka is well ahead in that consensus stage. I think people, again, recognise Kafka as the main technology and the winner here, at least for most use cases, and I think it’s just hard for someone to get deep on both and then try to develop that within their organisation. Certainly, it’s not to say that Pulsar is not the right answer for certain use cases, obviously, as you mentioned, Splunk and DataStax have chosen Pulsar for their infrastructure and so that’s great, they have certain needs where Pulsar makes sense, but I think, as far as I can see, it’s going to take quite some time for the broader community to recognise Pulsar as the more general event streaming data pipeline for more use cases.
SP: There are two examples, for Splunk and DataStax, I think the two main reasons they went to Pulsar, for one, for Splunk, it was because they had a proliferation of topics, basically, messaging topics, in the order of millions, which is something that’s not necessarily, I guess the perception is that it’s handled much better in Pulsar and it’s more scalable along with topics to mention in Pulsar and so they went with that one. Then for DataStax, I think it was about multi-region replication and multi-region latency so they wanted to have a more globally distributed infrastructure. I think on the topics thing, I think most use cases don’t have that level of scale in terms of topic proliferation and so that’s a pretty niche need there. I think in terms of multi-region replication, that’s actually something that Kafka or Confluent, at least, already has as well, so when you buy Confluent platform, you can get multi-region replication, it’s implemented a bit differently but I think it meets the needs of most who need that. Again, I think it’s hard to parse these things exactly why someone chooses one over another, sometimes you just have people who try something first and they really like it or they’ve been part of that community or they invested in that community so that they push that within their organisations. I think, broadly speaking, if you’re looking at the macro factors, looking at the overall market trends, it’s going to be hard, I think it’s going to be hard for Pulsar to overtake Kafka, especially because big folks like Confluent, they know that Pulsar is there and so if there’s anything specific to Pulsar that customers really care about, really need, and it’s appropriate for Confluent to build that out, they would definitely consider investing in those features or capabilities as well.
SP: Obviously, it has to be both. I think for listeners who have had the initial filing from Confluent, obviously, they’re showing pretty good net dollar retention rates or net expansion rates, in fact, and so that’s going to have to continue. The second thing is in terms of new customer growth, I would say that, as I mentioned earlier, there are a ton of companies out there that have just started in the past year or two experimenting with Kafka and real-time use cases. We’re at this inflection point where a lot of these teams and products are coming to maturity and they’re realising that they need some help. That’s what I was mentioning earlier, just that there are so many teams and companies out there already using Kafka but they don’t know what they don’t know. When they get to the point where they need to take an in-house project into production, into a reliable dependable customer-facing, whether it’s an internal customer or external customer-facing product or experience, it’s often the case that they can’t or are not ready to or not able to or not willing to invest in the internal capabilities to carry that out because it is very expensive and it’s an investment. I think there’s such a rich field of those types of customers and teams within large organisations that are going to be looking to Confluent to accelerate those projects and truly help them bring real-time applications to life without the pain and cost of doing it themselves.
SP: I think your intuition there is right. I think it’s definitely the intention and the vision to transition as many customers as possible and to grow the cloud segment, and that’s not just because it’s a better business but because that’s where customers are. As I mentioned, one of the biggest drivers of adoption of technologies like Kafka is increasing transition to the cloud and both multi-cloud and hybrid cloud deployments. As it happens, Confluent is actually very uniquely positioned to capture that trend because Confluent is the only company providing a Kafka solution that can be multi-cloud and can be hybrid cloud, besides doing it yourself using open-source. I think I’m very confident that the cloud revenue contribution to the overall top line for Confluent will continue to grow and accelerate just purely on the fact that more and more customers are more and more confident running core infrastructure on the cloud.
SP: A little bit, yes. I know a bit of the historical context there, yes
SP: I think, certainly, we should all look at history as a way to learn how to take the best path forward. I think the story of Hadoop is well-told in terms of IT circles. Essentially, I think Confluent has learned from those mistakes. I think the difference is Hadoop was strong at a specific use case but it was actually very difficult to use and it was confusing to use and they tried to position themselves in a way that it was not the strongest technology for. I think Confluent has not made that mistake. Confluent knows, they know that Kafka is positioned and Confluent is positioned as the central nervous system for data in motion. As you saw, you’ve seen from all the public investor materials, data in motion is really the core tenet of what Confluent are trying to build. They’re not trying to replace traditional databases or data warehouses, Confluent knows that the core value that Kafka is bringing is really about that data in motion, streaming data and stream processing, and this is a new category. This is not to say, “We’re going to replace something you already have,” this is saying that based on the new technological needs for your applications and your use cases that you’re developing today, this is what’s going to help get you there. It’s not about we’re a better and cheaper mousetrap for replacing your old ETL streams. It’s a slightly different story but I appreciate the historical concern there, which is totally fair, but I think there are also many companies that you can see have successfully escaped that, if you think about companies like MongoDB or even Snowflake.
SP: On the technical side, I would definitely, obviously, focus more on the cloud product and what are the plans to make the cloud product more available, more attractive to a broader set of the market. How do they plan to expand that revenue pool? I think, obviously, for a cloud product, the expansion has to be not only usage expansion, meaning customers use it more and more, but are there new features, are there new paid components that would continue to grow that size of the basket, essentially, the ASPs. I think to your point also previously, a good question would also be around competitive features going to go against alternative technologies like Pulsar, but also to stand up against lower-cost hosted services like MSK and Aiven.
SP: No, not really. For people wanting that, the Confluent platform product actually just earlier this year released capability to run on top of Kubernetes. These trends are all in line with how the company is seeing technology adoption in various different areas.