why ppl want to run ML in the browser ? Sorry I can not think of a use case.

lovelearning · on March 31, 2018

One important reason is that it helps democratize ML to make it accessible to everybody. Even a casual hobby programmer can write awesome applications, and anybody - especially non-technical users - can use them. It makes ML - especially inference - as easy as opening facebook.

Right now, just to consume ML models - whether as a application developer or an end user - requires some combination of special skills that fall in a spectrum of complexities - from something relatively simple like installing a system package or an environment like Anaconda or a pip package, to something much more complex and time consuming like building TF or Caffe.

ML in browser bypasses all of that.

bo1024 · on March 31, 2018

I still don't understand -- do you have an example application in mind?

You seem to be conflating producing ML models -- i.e. doing machine learning -- with "consuming" ML models -- i.e. asking the learned models to make a prediction. You don't need any ML in the browser to do the latter. And I can't see why you'd do ML in the browser to do the former...

lovelearning · on March 31, 2018

[Update: Oh boy, I realized this is a gigantically long post after posting it. Sorry about that, but I hope my step by step explanation convinces you.]

Not conflating - the benefits apply equally to both learning and inference, but since there are magnitudes more potential consumers of inference than learning, I emphasized it.

It's true that one doesn't need any of this, but my point is not having them in browser means there are barriers put up - of complexity, of costs, of privacy, of effort - to developers and end users.

I'll use face recognition as a walkthrough example, but this applies to absolutely any ML use case if you think about all the steps involved in taking it from idea to development to deployment to end use.

Take a problem I've worked on a bit - intelligently searching through personal photos and videos. Most people have atleast a few hundreds of GBs of photos and videos - family photos, pets, travels - in aggregate across all their devices. Some may feel the need for search software that can answer questions like "find me that photo with Alice (user's daughter) playing with Scooby(user's dog) from 10 years ago".

In a world without browser ML, how would a developer design, develop and deploy this with maximum convenience for both development and end use? Maybe like this...

- dev starts off by deciding they don't want to mess with any of that ML stuff. Their skills lie in front-end design and usability. They decide to go with Amazon's or Google's face recognition service (IDK if AMZ/GOO actually have such a service, but if they did, it's reasonable a dev would look at them as the first option).

- But they soon find out it's just shifting the complexity elsewhere. Now, they have to provide a way for users to upload their hundreds of GBs of media to S3 or GCS. Which means more APIs to learn and integrate. More costs for storage. Usability barriers and privacy suspicions for users. Security aspects have to be looked into. Looks like it'll have to become a paid service now.

- The service by itself is not enough. Dev still has to provide the front-end (which they are skilled at) for users to select photos, crop faces, apply labels, and send it all to the service's transfer learning API.

- After all that, some users complain that accuracy is not good enough because it couldn't find many photos. Dev has no way to tweak the models because those are behind another company's opaque service. It's increasingly looking like a custom backend is necessary.

- So version 2. Dev learns some ML. Then downloads a pre-trained model that can do face detection and recognition - say FaceNet or OpenFace.

- They have to deploy it server-side for training and indexing. They learn a bit of Nginx and WSGI, and deploy it. They don't know how many users will use and how much data will be uploaded - have to plan automated scaling for that. EC2 or GCE? More stuff and more APIs to learn, and more costs.

- Dev still has to provide the front-end for users to select photos, crop faces, apply labels, and upload to their learning service. Dev has to implement per-user transfer learning and store per-user transfer data and models.

- Dev has to implement all the required provisioning for inference and transfer learning - be it raw GPU servers or docker or K8s or whatever. More costs.

- For an end user, the need to upload hundreds of GBs of personal media to a 3rd party is also a barrier - takes time, loses privacy and likely incurs bandwidth costs.

- So version 3. Dev says forget the server-side. User already has GBs of photos in their hard disks. Instead of bringing their photos to us and managing it, let's take the software to them. Let's just package up everything and allow user to download and use the entire thing on their local machines. Maybe as platform-specific installables. Or as platform-neutral docker image. Reduces costs and complexity for developer. Can even be free since there are no costs incurred by developer. Android's still a problem since it can't do docker, and dev doesn't know Android app development.

- The end user too benefits with far better privacy and usability. However, they still have to install a package - sounds easy, but in a world of "user does not have administrative privileges" and "sudo", there are still potential barriers to cross. And Android is still a no-go because the dev doesn't know it.

Now in a world with browser ML, you can see how those remaining problems too can be solved. Javascript ML is write once, run on any browser - even Android's. User does not have to install anything. Dev does not have to write anything specifically for a different platform. All the transfer learning and inference can happen in user's browser.

The browser environment still presents some barriers - such as not being able to access local photos directly without user selecting them, and limited local storage for models. But both can be solved with some creative batching and using solutions like emscripten's virtual file system in memory (I'm not sure if TF.js uses the latter, but other frameworks like OpenCV.js do). User pays some cost of reduced usability, which they may be ok with since they may see the alternative options as being worse. And the privacy is matchless.

All this is applicable to any ML use case. Anything involving user's private data such as speech recognition or document scanning/OCR too get the exact same benefits for both developers and users.

bo1024 · on April 3, 2018

Sorry for the late response, but I want to thank you for the in-depth post! I agree with you that version 3 is way better, but I'm very cautious about advocating for browsers as de-facto operating systems. If we want better cross-platform systems for sandboxing and running programs, I'd prefer to develop those directly instead of giving more power to browsers and browser vendors.

Jack000 · on March 31, 2018

how about real time object detection/facial rec/semantic segmentation via webcam

diminish · on March 31, 2018

what are some options for consuming ML models in the browser?

asdsa5325 · on March 31, 2018

Let's not exaggerate; using something like pip is incredibly easy compared to actually using the ML package. The hardest thing about ML is not setting it up...

Houshalter · on March 31, 2018

It's platform independent. Most deep learning libraries have absurd dependencies. They are difficult to install and only support specific OSes and GPUs. Javascript will run literally anywhere.

shwetank · on March 31, 2018

Think more from the inference angle. There are a lot of use cases for ML and many are using native apps or desktop apps. This allows you to run it on the browser on the web, which significantly broadens the amount of people who can use it.

So can you think of use cases for using ML in an app (native or desktop?) Many of those would be good to have in the browser as part of a web app too.

rytill · on March 31, 2018

Become more creative.