Hacker Newsnew | past | comments | ask | show | jobs | submit | stevenacreman's commentslogin

https://kubedex.com/ - I write about playing with Kubernetes

When I was young, before my first IT job, I used to love playing with Xen, distcc and watching my desktop pc scroll compiler output for days rebuilding my X-Windows. I fell in love with the idea of Beowulf clusters and Gestalt computing.

Kubernetes combines a lot of all of that stuff so I'm very lucky to both work and spend my free time on what would be a hobby anyway.


what are your father's and mother's professions?


I keep a Google sheet updated with feature differences between LinkerD, LinkerD2, Consul Connect and Istio.

https://docs.google.com/spreadsheets/d/1OBaKrwR030G39i0n_47i...

From my own experience I've had some great success with LinkerD in the past on Mesos DC/OS.

Since moving companies, and switching to Kubernetes, we've yet to deploy any service mesh into production.

The blog from Jerome highlights many of the benefits already.

From my perspective the bigs ones in the past were:

    - Tracing (with Zipkin)
    - Retries which removed or fixed dodgy app reconnect logic
    - Time series metrics in LinkerDViz showing realtime rates and latency and errors between services
The reason we haven't used any service mesh at my current company is mostly based on stability concerns.

Istio gets all of the cool press attention and blogs written about it. Yet, you also read a lot of warnings about it always being 6 months away from being really robust. Even at version 1 we read some horror stories about obscure bugs showing up in fairly standard use cases.

Connectivity between services is too scary to gamble on. It's a similar deal with CNI (we're still on Calico despite arguably cooler stuff being out there) and Ingresses (still on ingress-nginx).

AWS have a service mesh that is probably going to be the one we trial next at work.

Improved observability and retries would definitely be of benefit on our current platform. Another driving factor is also our security team wanting mutual TLS between services.


Does anyone else think mTLS on the public cloud is a waste of CPU cycles (and therefore money)?


Yes - esp if you have a sidecar which speaks in-securely to your application. Data theft happens from application issues, or employees with access stealing things - not because of unencrypted traffic in a secure network.


There's a cost to ensure all your data is encrypted in transit regardless of how you do it.


Even if it does cost more its probably worth it.


I personally don't really understand this approach. In fact, this approach annoys me.

In this situation I feel like the product has already wasted my time and now people are contacting me to waste more of my time.

The blog is really detailed and includes screenshots. You could fix the obvious issues and then follow up with an update once it's released.


In the opening paragraphs I stated it was my hope that this gives me a gateway to communicate with someone that isn't support and isn't the void that is their robot response issue tracker.

That said, they generously reached out to me and appear to genuinely be concerned by my experience. I plan to keep posting as I run into issues and also as they correct my mistakes and also solve my problems.


This is true of all of the services I've tried across Azure, AWS and Google cloud.

Microsoft are always last place in terms of performance and reliability.

For Kubernetes I actually keep track of exactly how bad they are with an automated testing tool.

Results of the last test are here: https://kubedex.com/is-azure-kubernetes-aks-any-less-terribl...

I've complained about UX before in the context of the Azure portal being, in my opinion, terrible. But that's subjective and I respect that others may like it, although not sure how they do.


You should do an updated test comparing AWS as well. I feel like there should be more continuous benchmarking efforts of cloud vendors. This is the last one I remember in this genre (https://www.azurefromthetrenches.com/azure-functions-signifi...). I wonder if someone could make a business doing that.

Also, I've found in terms of UX GCP > Azure > AWS personally.


This is very interesting data, thank you for that. Maybe MS has improved a bit although I haven't felt any improvements yet. But the changes might not be rolled out completely. Performance wise it can only get better in my opinion.


Has anyone tried signing up for remote.com?

I just did and they wanted an email for jobs which I thought was OK.

Then I tried to add a profile and it prompted me to export my entire LinkedIn history. It says to export "The Works".

My LinkedIn is quite massive and I've checked the export for "The Works" includes my entire contact list.

That's pretty shady. Why not just import my profile? There's an option to export only profile data. How many people will follow these default instructions and upload everything they have ever done on LinkedIn?


We should be much more transparent about this, you're right.

We're going to split this up between your profile and your contacts, and make it explicit what happens.


I know it's nice to see you in here and commenting, but I have to weigh in a bit to offer advice. These types of issues shouldn't exist in a modern application. Particularly one that deals with a fairly sensitive subject for a lot of people (looking for work).

I hope it's not an indication of the overall data handling approach within the application in general and just an oversight for launch. It might be worthwhile to re-visit your internal policies/review practice for safe harbour and be upfront about it.

Just a suggestion!


Are you actually doing anything with the contacts currently or is that data discarded?


Thank you. I never like to import my contacts cause they don't necessarily give me consent to share their contact information, but importing the other bits from a profile is certainly useful.


I felt the same, but exported just my profile from LinkedIn and the import worked all right.


It's good to see a project focussing on production-grade databases on Kubernetes. Particularly the production grade part.

There are 33 open source operators for managing databases on Kubernetes. Out of that list only 3 claim to be production ready.

Out of 126 Operators that I've looked into the vast majority are abandoned and unfinished. Most state the project status as Alpha in the readme.

Kubedb itself has a version number of 0.8.0 for the operator and very low version numbers for the databases. For example version 0.2.0 for Redis.

Version numbers can mean anything but they are usually a good indicator of what the project owner thinks the status is.

It would be cool to see a break-down of status and expected dates for milestones for Kubedb.

For anyone interested in browsing other Operators I keep a table updated half way down this blog post.

https://kubedex.com/operators/

The project statuses come directly from what the authors have stated. Many beta status projects are being used in production.


Part of the problem is that Kubernetes itself is still changing rapidly and already has design-by-committee cracks in the API.

It would help if the community took a break from new features and worked on stability first so that Operators and other extensions can finally take off. Some of the things being developed now are so esoteric that it seems to be more about finding the next exciting thing to add than usability.


You're using that term in a derogatory sense. Would you rather have Google decide how everything is designed, and everyone else has to deal with it? I think you'd see a ton of GCP-specific stuff if that were the case.

I used to think how you did about kubernetes because I saw just how long it took for features I really wanted to get in. Then I attended some of the SIGs, and realized that there are so many use cases out there unlike mine, and that doing what I want may break what others want. So instead of making a decision that screws over everyone but one cloud provider, what I've seen is very methodical and careful decision making from many companies working together. This usually means that you get something that may not do exactly what you want out of the box, but there are hooks to do it if you'd like. I'd much prefer this over nothing at all.

It would be worth sitting in on a SIG you're interested in, and see how @smarterclayton and @thockin handle these kinds of decisions. I see so much negativity on HN about k8s, and it really seems like people just don't appreciate the amount of attention that goes into each decision. I think if you spend the time to trace the history of a feature and understand why things are done, it may change your mind about how complex k8s is.


What are some of the design by committee cracks that you think should be addressed?


> Some of the things being developed now are so esoteric that it seems to be more about finding the next exciting thing to add than usability.

Or perhaps it's real ops people with particular arcane needs, each scratching their own itches?

K8s is a large FOSS project; and like most large FOSS projects, most PRs are from corporate contributors that wrote the code for their own purposes and then wanted to upstream it to avoid having to maintain a fork.


>"Part of the problem is that Kubernetes itself is still changing rapidly and already has design-by-committee cracks in the API."

Could you elaborate a bit on what those "cracks" are?


Stability of what?


What would a production-grade conformance test suite look like for K8s to get these operators to 1.0?

I am mostly a bystander, but in the k8s issues I see, it is too easy to either destroy all the pods or their volumes. Maybe this should be fixed at the k8s level.


>too easy to either destroy ... their volumes.

As someone who's started running services in Kubernetes (albeit mostly as a hobby thus far) I would recommend setting the ReclaimPolicy to Retain for any PersistentVolumes that are particularly important. The default behavior is to delete the underlying volume when the resource representing it is deleted, but if you're worried that might happen accidentally that may not be what you want; this behavior is configurable.


> Maybe this should be fixed at the k8s level.

FWIW, it has been: RBAC allows you to strip -- or I guess pragmatically speaking, not assign -- rights at whatever level of granularity you have the patience to maintain. It is also bright enough to do that per Namespace, so going light on the ClusterRoleBindings and keeping things out of the "production-db" Namespace would likely go a long way toward addressing the risk you are describing


It's still containers. On a cloud provider the Kubernetes workers are VM's which orchestrate containers. With Kata Containers you're just spawning containers inside micro-vm's.


> It's still containers.

The security profiles of containers and VMs, including kernel-based VMs, are different. VMs still have a significant edge, because the attack surface is smaller and doesn't have many competing missions.


The attack surface of a container can be massively reduced with seccomp profiles -- there was a paper a few years ago which found that the effective attack surface of a hypervisor was about the same as the attack surface of a locked-down seccomp profile of a container (and LXC/Docker/etc already have a default whitelist profile which has in practice mitigated something like 90% of kernel 0days).

And let's not forget the recent CPU exploits which found that VMs aren't very separated after all.

The fact that Kubernetes disables this (and other) security features by default should be seen as a flaw in Kubernetes. (Just as some of the flaws of Docker should be seen as Docker flaws not containers-in-general flaws.)


> The attack surface of a container can be massively reduced with seccomp profiles

Yes, though as capabilities are added to the kernel, the profiles have to be updated.

That said, VM or no VM, this should be done no matter what.

> And let's not forget the recent CPU exploits which found that VMs aren't very separated after all.

This is a nil-all draw in terms of the respective security postures, though.


The economics agree, Zerodium pays as much for VM escape as for LPE. It does seem to be a bit of a low price though, $50,000.


The attack surface argument is debatable, depending on how the system is designers, since virtualization introduces the hypervisor surface.


The attack surface argument certainly is debatable.

I wonder how many multi-tenant workloads are actually at risk of an escape vulnerability. I wager that the multi-tenancy described in the article in the OP is actually disparate workloads across disparate teams in a particular enterprise where it seems (to me) fairly unlikely for someone with access to run a workload to also have the willingness to compile and run malicious code to take advantage of an escape vulnerability.

On the other hand, publicly available compute, i.e. AWS, GCP, Azure seems way more likely to be the subject of attacks from random malicious individuals seek to take advantage of an escape vulnerability if one existed.


The hypervisor surface can be made smaller, since its major goal is to manage hardware resources. A kernel has the same mission, but also has a mission to provide a rich API for applications.


it's not really as clear cut as that (IMO).

Shared kernel linux containers can be hardened to the point, where they likely have a smaller attack surface than a general purpose hypervisor (for example look at the approach that Nabla takes)

You then have the hybrid approach of gVisor, still containers, but smaller attack surface than the Linux kernel.

Of course this hardening approach can (and should be) applied to VMs too, which may tip the balance back to them, which is one reason that firecracker looks so interesting.


That is one big difference, with big implications.


Just wanted to clarify that Kubernetes was still scheduling containers. Even if VM's are being used to isolate them.

It's not all or nothing either. Containerd will support running a mix of containers and kata-containers across workers.

For anyone interested in this topic I wrote about some other container runtimes here: https://kubedex.com/kubernetes-container-runtimes/


Note that this is not the case for virtual kubelet-based implementations, and your point here and above are specific to how Kata works (the article is talking more generally).


Yeah.. I think I see what you say, I mean, the end user interface is the quite the same but this has big implications anyway for the systems design POV, so is not a small thing :)


It's even worse than that.

Knative is really an alpha piece of software at version 0.2.2 for the serving component. Riff is also at 0.2.0. What do Pivotal plan to do if both implement breaking changes (extremely likely), maintain a fork?

The other issue is what value is this all really providing? Kubernetes provides a standard API that abstracts infrastructure and deployments.

The benefit that Lambda brings is very simply connecting together cloud services. None of these FaaS on Kubernetes products do that. For anyone interested I looked into the current landscape on Kubernetes and gave up since it's all pretty worthless. https://kubedex.com/serverless/


"The other issue is what value is this all really providing? Kubernetes provides a standard API that abstracts infrastructure and deployments.

The benefit that Lambda brings is very simply connecting together cloud services."

Generally, the benefits of a function service are:

- scale to zero: when a function is not active it won't use any resources and create costs.

- higher level of abstraction: if a piece of software fits well into the FaaS abstraction, it should be more productive to implement and operate it on the FaaS level over lower levels (PaaS, container, IaaS, etc.) K8s in particular is quite a complicated system to target by an app, which is why Knative was started.

If Lambda makes it easy to call other cloud services, I'd say that's a side-effect of a good FaaS implementation. Bringing this benefit to other function services should be a matter of using the right libraries.

(I work at Pivotal, but not on Riff or Knative)


> If Lambda makes it easy to call other cloud services ... Bringing this benefit to other function services should be a matter of using the right libraries.

You are correct in that the calling out to other things is just a concern of the function itself, but the value in the 'connecting' that Lambda does is from being _invoked_ by other Cloud services by way of integration to their event systems. e.g. Object storage file creation event X triggers Lambda function Y to update resource Z (resource Z isn't necessarily a Cloud service, it could be a database).

This is why I'm skeptical of on-prem FaaS. It's an easy value proposition to sell when you can use Lambda as an example. But Enterprises have heterogeneous environments so Lambda-like integration into other services is far from a given, and 'scaling to zero' is a little disingenuous because there always needs to be underlying infrastructure (k8s in the case of PFS) running to handle function invocation.


> But Enterprises have heterogeneous environments so Lambda-like integration into other services is far from a given

Because it's not a walled garden. As the ecosystem grows that pain (and it's real) will ease.

> 'scaling to zero' is a little disingenuous because there always needs to be underlying infrastructure

The point is to use it more efficiently. Mixed workloads with scale-to-zero help achieve that end.


> What do Pivotal plan to do if both implement breaking changes (extremely likely), maintain a fork?

The main riff team all work for Pivotal and Pivotal was the first external partner brought into Knative by Google. We were the first to submit non-Google PRs and the first to have non-Google approvers.

It would be strange for Pivotal to be blindsided by two projects it is intimately involved in.

Source: as you probably guessed, I work for Pivotal.


As somebody who has created a product in the past and also reviewed quite a few I've given up doing performance comparisons. This is quite sad as comparisons help people save time and money and cut through the marketing which technical people hate.

Every time I've done a performance comparison an expert pops up and says the result is invalid because of X. It takes 10 seconds to write the comment but perhaps a few hours to redo the tests and update the blog contents.

The blogger doesn't want an inaccurate blog and the software authors don't want bad benchmarks left up which constantly crop up in search results. As a blogger you feel a little duty bound to work on updating a blog you know probably won't be re-read by the majority of people who have already opened it anyway.

My conclusion is that fault should fall on the side of the software developer in most cases. Having created a startup I understand the time pressures and motivations driving the roadmap. There is a natural tendency to work on the differentiators and high value complex features. Blogs like this should act as a reminder that there is massive value in prioritising sane defaults, tests, documentation and building logic into the application that makes incorrect settings that effect performance unlikely.

From reading this blog I get the sense the author is quite technical. A positive public relations move would be to spend your time replicating the results and then when the problem is found make it difficult for the next person to have the same issue. Preferably with logic in the software, but worst case scenario with some bold text towards the top of the readme so it's not buried somewhere obscure.


For those interested in a feature comparison I did one here:

https://kubedex.com/kubernetes-network-plugins/


This is pretty neat. Would be interested to see aws-cni-vpc in there, too!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: