Hacker Newsnew | past | comments | ask | show | jobs | submit | mchmarny's commentslogin

GPU Kubernetes is hard. Aligning kernels, drivers, container runtimes, operators, and Kubernetes versions is a version compatibility minefield. A single misconfigured component can take down an entire GPU fleet, and root cause analysis can take days. Typically, these known-good configurations live as tribal knowledge in “runbooks” and internal pipelines, not as portable, reproducible artifacts.

The recently open-sourced AI Cluster Runtime (AICR) project is designed to solve that friction. It provides optimized, validated, and reproducible configurations for a given GPU-accelerated Kubernetes cluster.


Keeping a GPU cluster healthy at scale isn't just a "nice to have"—it’s the difference between seamless training and a nightmare of idle nodes. That’s why we built NVSentinel, our open-source system designed to detect, classify, and auto-remediate hardware and software faults across Kubernetes nodes and NVSwitches.

If you have an option to containerize the app, Jib may be what you are looking for. Plugs into Maven, and the same source/content always generates the same image - https://github.com/GoogleContainerTools/jib


And this is the best explanation of Jib [1], but it’s hard to find via Google. It’s how all builds for every ecosystem should work IMO.

1. https://phauer.com/2019/no-fat-jar-in-docker-image/


“But each layer here adds an element of required trust”, how often we simply glance over that and assume. In the same time, building everything from source is also neither reasonable nor 100% secure. Glad smart people like Dan are looking into this.


What's not reasonable about building from source? It really seems like a small cost to snapshot a version of third-party code into your own source control system. There really aren't (or shouldn't be) that many different things in your environment.


I worked on buildpacks for a while. Those had to track "only" a few hundred upstream dependencies and build them from source. It took a fulltime team to build and maintain the infrastructure.

The problem here is economic. Because there is no body of information about software assets acting as public good, each individual is forced to internalise the costs. Either you internalise the cost of building everything yourself, or you internalise the risk of accepting incoming assets on faith. But you bear a cost.


The amount of dependancies, build environment setup steps, across all of the components, and sheer complexity of putting the entire stack together from scratch, would exclude a substantial portion of the target users... hence, while possible, is not reasonable, certainly not on angling basis.


Building from source seems me the only way to get to a reasonable state when it comes to security patching.

However these modern stacks contain a lot of small parts. The article gives the kiwigrid/k8s-sidecar as an example. Also, well shown in the article, it is not as easy as copying some commands to your own Dockerfile. Look at the busybox image using glibc from another debian image.

It's not reasonable in the sense that it is a hell lot of work and would require more time and effort the average devops/sre/whatever (team) has.

It would also go against the promise that Kubernetes would make things easy because you could just do helm install stable/prometheus


Based on the write up it seems that the author could benefit from the fully managed version of Cloud Run on GCP (https://cloud.run). No cluster to manage would help with maintenance pains and per request billing model could potentially lower the overall costs given the scale to 0 capability.


I was going to mention this as well. Any time a user has an underutilized K8S cluster if only 3 nodes I immediately wonder if the auto-scaling (to zero) docker-running features of Cloud Run are what they’re really looking for.


Every couple of months I search for a better news reader… cycle through Flipper, Feedly, NewsBlur, & 4-8 others… once more acknowledge how good Google Reader really was, and go back to scrolling through Twitter/Reddit

I’d probably pay large amount of money now for a news reader w/ these 4 features * Extract stories from my feed/lists on Twitter & Reddit and aggregate comments on these from people I follow * Learn from my ‘more/less like this’ on individual stories * Allow me to tag sources and create auto-tagging filters * Clean/intuitive Web UI & mobile client


Have you tried https://bazqux.com ? It was built right after Google shut down Reader and very much resembles and improved upon it in multiple ways. Super fast too.


Hey i'm working on a News Aggregation platform / RSS reader, aktu.io. It doesn't have the features you need but if you'd like to give it a try anyway, would love your feedback.

You can check out the main features here: - https://aktu.io/about - and here https://medium.com/@julien.aktu/rss-less-noise-more-informat...


While there are definitely areas where it sorely can be improved, check out Inoreader.


It’s exciting to see ecosystem build commercial products using Knative (https://github.com/knative). The developers on those platforms benefit from increased portability. Congratulations to the riff team behind the new Pivotal Function Service offering


I wrote a little post to explain why Knative is important from the developer perspective https://medium.com/@mchmarny/build-deploy-manage-modern-serv...


Great post, really cleared up the value proposition of this project for me!


this could be interesting, especially if from the trenches, supported by real-life scars (experiences)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: