More

yuvipanda · on April 23, 2019

Thanks! This is a good start.

Do you have ideas on how you will make the governance of the registry open? Who will make important decisions around policies, and how will they be made? I personally find this to be the core reason I find it hard to use startup-run package registries.

syrusakbary · on April 24, 2019

We are starting to research on governance, but to date nothing is defined (since we have just launched).

However, I would love to follow up with you personally to make sure we make the right choices from the start. Can you write me an email to syrus@wasmer.io so we continue the conversation there?

yuvipanda · on Dec 8, 2017

Chime in at https://github.com/google/kubeflow/issues/23 :)

yuvipanda · on April 4, 2017

Disclaimer: I work on the hub.

Sorry to hear you found the hub too complex. We're working on making easier-to-use hub setups that fit different use cases. Can you tell us a little more about what your use case was and (optionally) which parts of the hub setup you found too complex?

Thanks!

sandGorgon · on April 4, 2017

Thanks for replying.

So it's a bunch of different technologies - nodejs,etc. I'm kinda wondering if it can be built in Python itself. Make it part of a normal jupyter install, so just a "jupyter hub start " will work ?

EDIT: adding to that, you have built a nodejs based http proxy - can you not build it within Python (for uniformity) or nginx (for performance as well as mind share) ? Do you even need to mandate a http proxy ?

Second question is that can it run in a multiprocess - I don't want to run it in interactive mode, but just straight top to bottom. Perhaps there's huge memory savings there.

carreau · on April 4, 2017

I think yuvi wrote the CHP on nginx (https://github.com/yuvipanda/jupyterhub-nginx-chp) when we first wrote CHP, node was the only viable solution to have a dynamic websocket proxy. Nowdays Go, or Python 3 with AsyncIO may be potential contenders. It may be possible to rewrite in Python but time is limitted. I'm unsure about your second question.. run notebook top to bottom ? `nbconvert --to notebook --execute --inplace yournotebook.ipynb` ?

sandGorgon · on April 5, 2017

Well the second question was also related to multi-user deployment. From what I understand, jupyterhub will spawn multiple kernels every time someone logs in. But a lot of the time (most of the time?), you don't intend people logging into your jupyter notebook to be doing interactive stuff - maybe they just want to run the whole thing as a dashboard.

So it becomes a traditional webapp use case. Do you need all the proxy/websocket, stuff to do this ? Your nbconvert command still needs every user to spawn his own kernel right ?

About the first part - it would be great to have a simpler jupyterhub. One of the steps is to have everything in Python.

yuvipanda · on April 5, 2017

JupyterHub isn't really setup to do a 'dashboard' style web application - is purely intended for interactive use. The design choices made reflect this.

There's ongoing work on formalizing the proxy better (https://github.com/jupyterhub/jupyterhub/issues/848) - someone will probably write a pure python proxy when that gets merged :)

sandGorgon · on April 5, 2017

I just wanted to make sure you guys were aware that it is a large component of the use case. The very typical prototype in jupyter .. to ..Rewrite in production code is shortened significantly by doing this.

All the tools already exist in jupyter - except one: lightweight multiuser. I would argue that building this is going to be a fairly trivial thing for you guys (as compared to other features you build), but the end user benefit is immense.

Jupyter becomes much more than an interactive scratchpad - it becomes a full blown prototyping environment for data science and reporting. I would say, you would even go against tableau in a lot of use cases.

Please do think about it. My company will be happy to contribute to a gofundme on this.

carreau · on April 5, 2017

Reply to this plus 2 comments up. Hub spawns _servers_, not kernels. There are a lot of indirection layers, and indeed, being able to _view_ a notebook without starting a kernel is on the todo list. The multi user collaboration is in progress, it's more complicated than it looks. One of the issue is that if this is a "solved" [with many quotes] problem for static documents, as soon as you have code execution it becomes really tricky. The kernel need to run as someone, but who ? The owner of the document ? What are the permission you give to who and how ? There are some case where there are possible answers, but which are really hard to tackle in a generic way across programming languages and various kind of deployments. Ian has an already well advance JupyterLab Prototype that you can connect to Google Drive for live editing. If your company is interested in funding something like that, feel free to write to any of us privately (git log, and grep to find emails), and we can likely setup a contract with numfocus (non profit that handle our funds), the advantage will be that it will be tax deductible for your company (unlike most of gofundme campaigns).

sandGorgon · on April 5, 2017

I wish we had the kind of money to fund the full development - I mentioned gofundme because a small startup in India will not be able to do that, even though we want to. But I have a feeling that a lot of us will want to us as well.

I'm talking specifically about the usecase of code-execution (especially dashboards).

Here's a small point from me - perhaps you are overcomplicating the usecase for 90% of us. Give a proper ssl/tls+bcrypted password setup and roles: Editor and User.

I dont think you should be worried here in the context of people wanting to run a full on sagemath cloud kind of a thing.

If you can give me a low resource way of letting 100 "Users" on a dashboard form and one "Editor" (who can actually edit the underlying notebook), I'm golden. And I'm willing to bet that so will 90% of your audience.

acosmism · on April 5, 2017

check out http://gryd.us ; it looks somewhat like what your describing

yuvipanda · on April 4, 2017

You can use systemd or docker with JupyterHub, and make it as protected (or not) as you want. We could also write a spawner that spawns full fledged VMs, which would give you 'real' untrusted isolation - would that be something interesting to you?

yuvipanda · on Feb 21, 2017

Don't see anything else around that fits that bill. They also have amazing IRC bridging - I have been using Matrix bridged into Freenode as my primary 'IRC Client' for almost a year now.

kiallmacinnes · on Feb 21, 2017

IRCv3?

yuvipanda · on Sept 15, 2016

From a comment[1] elsewhere in thread, looks like they removed fluoride to avoid higher FDA scrutiny. Not sure what that means for labeling.

[1]: https://news.ycombinator.com/item?id=12507965

Bartweiss · on Sept 15, 2016

That may well be the only 'active' ingredient in some toothpastes, since the rest is mostly just a scrubbing aid.

yuvipanda · on July 26, 2016

Have you considered adding kubernetes support?

Titanous · on July 26, 2016

Kubernetes didn't exist when we started, so we ended up building our own scheduling system and management APIs. So it wouldn't really be "adding" support, instead we'd need to switch several layers of the system to be Kubernetes instead of our code.

We're watching Kubernetes, it's possible that doing that change will make sense at some point in the future.

nathan_f77 · on July 26, 2016

That's what Deis are doing right now with their new version.

yuvipanda · on May 16, 2016

Should be able to!

yuvipanda · on May 16, 2016

<3

My only question is if it is required that they mention 'I have cramps' explicitly publicly. Not sure if their menstrual cycle is anyone's business. There's also no reason for it to be taboo, but not sure where the balance of 'it is nbd, deal with it!' vs 'I do not really want to broadcast this information to my teammates' is.

deobald · on May 17, 2016

Good question, yuvi! We've answered this in the Q&A at the bottom of the article. The short answer is "no". :)

yuvipanda · on April 24, 2016

https://grafana.wikimedia.org/dashboard/db/varnish-http-erro... (and grafana.wikimedia.org in general) have more stats. It stated '13.38 Million req/min), which I think is ~230,000 req/s

jsmthrowaway · on April 24, 2016

Ah. So my hunch was that the published number was what is making it through cache, and that's where my estimate comes from too. That sounds about what I'd expect for the cached side.

Nice find!

_joe · on April 24, 2016

The total number of requests that get through the cache to the application layer can be seen here

https://ganglia.wikimedia.org/latest/stacked.php?m=ap_rps&c=...

which you can see is not showing any substantial increase due to the passing of Prince. The big hole of two days that ends just before the news broke is due to wikimedia switching traffic to a second datacenter for two days, see http://blog.wikimedia.org/2016/04/18/wikimedia-server-switch...