This is crazy and needs to stop indeed it's like you have:
- A degree (In which you've proven you can understand these algos and spent 4 years studying)
I'm not going to redo all that in 1 week before your stupid puzzle!
- Experience (That has to be worth something it's not like everyone is lying about it)
- You may have open source contributions
But no, some companies will not even start to look at that or not look at all, before
they ask you this stupid puzzle.
Personally I now filter those companies out, I mean 20 years of experience, contributions
in major open source projects, if you can't recognize that? why would I interview?
Were I work now we give code assignments, while they take longer to do you can actually see structure which I agree with the author is the top quality I'm looking for. They are also less stressful for the candidate.
> A degree (In which you've proven you can understand these algos and spent 4 years studying)
Unfortunately this means next to nothing anymore. I have a bachelor's degree in CS and Math, and I found it very useful. However, I host a fairly large community and provide tutorials on coding projects. I've had several people working on a masters thesis in CS reach out to me because they're following one of my tutorials for the thesis. They then ask a very basic question that indicates they don't know how a package manager works, or how to look up documentation on a library. I think these two tasks are some of the most basic programming tasks available, and if you can make it through 5-6 years of college in CS and still not have the most basic understanding of this, then a degree means absolutely nothing anymore.
Some more anecdata, I've had several friends who graduated with me and can hardly code. It's unfortunate, but I think degrees are just an expensive piece of useless paper that tells you absolutely nothing about the individuals abilities.
This. We'll always need a certain amount of cars for delivery and people with disabilities but the majority of people can walk or bicycle in cities. It would do us some good, in fact. Not just the additional exercise but also bringing back a culture that is less in a hurry, less time -enslaved, and less disconnected from the outside.
Our cultural disconnect from reality is as much at fault regarding global warming as our technology is.
I work in a ML ecosystem ATM and concurrency is a major problem in python:
- Threads can't be used efficiently because of the GIL
- multiprocesses has to serialize everything in a single thread often killing performance. (Unless you use shared memory space techniques, but that's less than ideal compared to threads)
- You can't use multiprocess while inside a multiprocess executor. This makes building things on top of frameworks/libs that use multiprocess a nightmare... e.g try to use a web server like over something like Keras...
Those are the top reasons I don't like python but if you got appetite for more:
- The dependency ecosystem is a pita, between python versions, package versions pinned or unpinned, requirements.txt, pipenv, poetry, conda... pick one and you're still sure to get into issues with other tools needing one system of another, or packages working a bit differently in conda etc... (I use poetry, with conda or pyenv)
- The culture of let's write code easily is good to start with but it becomes a problem as people especially maybe in DS don't go further then that... and you end up with bad practices all over the place, un-testable code (the tests systems are also a pain to navigate), copy & pasted blobs etc... Reading the code of some major libraries doesn't inspire confidence, especially compared to like Java, C++, go...
And last note I've seen way better emacs setup for python and presentations, it's ok as it is but I would not call it a Jimi Hendrix of python like a comment said...
Could you give examples of where exactly in the ML process/lifecycle you're hitting these issues?
For example: "When training a [type] model with X characteristics, the GIL causes Y, which makes it impossible to do Z".
We're building our machine learning platform[0] to solve problems we have faced shipping ML products to enterprise, and are interested in your problems as well.
For example, we've faced the environment/dependencies/"runs on my machine" problems and have addressed these with Docker images. Our users can spin up a notebook server with near real-time collaboration to work with others, and no setup because the environment is there.
The same with training jobs: they can click on a button and schedule a long-running notebook that runs against a specific environment to avoid "just yesterday I had X accuracy on my machine". The runs are tracked, the models, parameters, and metrics are automatically tracked because if we rely on a notebook author to do it, they might forget or have to context switch and it's an added cognitive load.
Some problems we faced were during deployment, too, where a "data scientist" writes a notebook to train a model and then we had to deploy that model reading their notebook or looking into dependencies. Now they can click on a button and deploy whichever model they want. It really was hindering us because they were asking someone else's help, who may have been working on something else.
I've been building a program that heavily uses multiprocessing for the past few months. It works quite well, but it did take me a little bit to figure out the best way to work with it.
> - Threads can't be used efficiently because of the GIL
Python's "threads" are actually fibers. Once you shift your thought process toward that then its easy enough to work with them. Async is a better solution, though, because "threads" aren't smart when switching between themselves. Async makes concurrency smart.
But if you want to use real threads, multiprocessing's "processes" are actually system threads.
> - multiprocesses has to serialize everything in a single thread often killing performance. (Unless you use shared memory space techniques, but that's less than ideal compared to threads)
I'm not quite sure what you mean. Multiprocessing's processes have their own GIL and are "single-threaded", but you can still spawn fibers and more processes from them, as well as use async.
Or are you talking about using the Manager and namespaces to communicate between processes? That is a little slow, yes. High speed code should probably use something else. Most programs will be fine with it, but it is way slower than rolling your own solution. However, it does work easily, so that's something to be said about it. Shared memory space techniques do work, too, but they are a little obtuse. Personally, I rolled my own data structures using the multiprocessing primitives. You have to set them up ahead of time, but they're insanely fast. Or you can use redis pubsub for IPC. Or write to a memory-mapped file.
- You can't use multiprocess while inside a multiprocess executor. This makes building things on top of frameworks/libs that use multiprocess a nightmare... e.g try to use a web server like over something like Keras...
I'm not sure what you mean. Multiprocessing simply spawns other Python processes. You can spawn processes from processes, so I don't know why you would have issues. Perhaps communication is an issue?
If you use numba (or cython, c extensions, etc) you can make them run without requiring that they hold the GIL, and they can run in parallel. Here's an example that should keep a CPU pegged at 100% utilization for a while:
import numba as nb
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import cpu_count
@nb.jit(nogil=True)
def slow_calculation(x):
out = 0
for i in range(x):
out += i**0.01
return out
ex = ThreadPoolExecutor(max_workers=cpu_count())
futures = [ex.submit(slow_calculation, 100_000_000_000+i) for i in range(cpu_count())]
Even without requiring the GIL, these are still child threads of the main process, correct? And because of that, wouldn't the OS keep them all on the same core? And if that's the case, would ProcessPoolExecutor solve that problem?
I'm going to approach this as a pure social thing.
To me emacs fits in a way because it fits the kinda misfit/hacker mentality I had back when I started playing with computers/software.
I found emacs and linux as the coolest things in the world while everyone was thinking computers were somehow for unwanted geeks in a basement.
And honestly that did not feel me with a need to be cool, wanted or popular.
Instead I developed a sense that having to learn things the hard way, fail and continue until I got it right without regard to what others thought had a super high value.
And to me that's what emacs symbolizes, it doesn't need to be popular or cool and maybe it will die at some point but even if that would be unfortunate it will have been central to a generation of free thinking, intelligent programmers.
And now it's 2020 it's cool to be a programmer, and we're awash with learn to be a dev the easy way seminars and here's how you can deploy your app in 2 clicks...
And that's good but it doesn't mean emacs has to be that way.
I think it is way way easier instead to create a new editor with the same free software mentality. Maybe that's Theia under the Eclipse Foundation, maybe it's something else. But migrating emacs to a VSCode like popular idea is crazy, you just can't migrate all of that.
Full disclosure I worked on GDB (using emacs) and Theia (a vscode like editor) and used emacs for 15+ years mainly doing c/cpp and org mode.
Isn't it strange all the work we put into securing networks etc... while we're engineers working from home and all it would take is someone figuring out where I live via Linkedin or whatnot.. and all this goes away with my physical security being pretty much non-existent.
Sure, but the vast majority of computer-focused attackers are unwilling to do show up in person. They want to operate from behind a computer. So you're not at risk of a physical attack from them unless they hire someone to conduct the physical attack. But they tend to be bad at hiring physical attackers, see how DPR hired hitmen but they turned out to be scammers.
If you're facing a government though, you do need to be wary of physical attacks.
I think this is somewhat funny since to think of this question at all really highlights the time people waste in JS/python etc.. Testing things that can be automatically checked by a simple compiler.
I do find that as some of these people migrate to TypeScript they write too much tests, as they are just not used to have some checks already done for them.
But to others that come from compiled/typed languages that's a question you'd never ask :)