So I see a link at the bottom: "Andrew Ng's Coursera course provides a good introduction to deep learning" which links to his "Machine Learning" class. This leads me to believe that "deep learning" is a synonym for "machine learning." Honest question: is that the case? Just a rebranding?
"Deep learning" is used a couple ways. In the loose sense employed by some journalists, deep learning is probably synonymous with both machine learning and AI. It's at the cutting edge of the current hype.
In the strict sense, deep learning refers to neural networks with more than one hidden layer. The depth of neural networks is equal to the longest path between input and output nodes. "Shallow" networks like a simple autoencoder might have two layers. They're not considered deep or part of deep learning. But if you stack them together, you have a deep net; e.g. many restricted Boltzmann machines form a deep-belief network. [1]
As @paulsutter mentioned, one aspect of having several hidden, or intermediate, layers in a neural network is that you can combine relatively simple, granular features (like individual pixels or words) into more complex combinations. Neural networks recombine simple features automatically, and then learn which groupings should be lent significance as signals through the backpropagation of error.
They're attracting all this hype because they actually do something amazing, albeit through brute force computation. Many in AI scoff at neural networks because they've been around a long time and have no particular elegance, but we're now in a historical moment where we have the hardware to make them work, and they're breaking records in almost every data type; e.g. images, sound, time series, etc.
So no, it's not a rebranding, it's a thing. We can now replicate the human faculty of perception with machines in many domains, and that's going to make the future quite weird.
Machine learning is a broad field of techniques; deep learning is an area of machine learning that I think generally focuses on neural network techniques, and usually rather large networks at that. Machine learning in isolation doesn't necessarily mean neural network techniques are used at all. Nor does deep learning exclusively mean neural networks, but usually "deep networks" are a part of it.
Deep learning is machine learning where the system develops ("learns") its own featureset. Generally by a multilayer neural network. Selecting features is typically the aspect of machine learning that required the most domain experience, and in deep learning that part is done by an algorithm.
The features can be layered. For example, speech recognition could have layers something like phonemes, morphemes, words, concepts, etc. building such featursets by hand is challenging, in deep learning the system learns useful features by detecting patterns.
Interesting that the most challenging technical task is among the first to be replaced by an algorithm ;)
Speaking of their academic origins, deep learning is a subset of machine learning. It just also happens to be the most exciting and the one with the biggest publicity/results in recent years, so its often (mis)-labeled to describe the entire machine learning field.
Andrew Ng's course still contains fundamental knowledge necessary to understand the motivations and reasoning behind deep learning (along with a lot of background ML knowledge that isn't needed), though, so I think its a good resource to link.
To the extent that it's a new umbrella term that mostly encompasses a set of techniques that have been around for a couple decades now, yeah, it's just a rebranding.
But I think the rebranding makes some sense because it calls attention to a common characteristic shared by all the techniques that fall under the brand: They tend to learn their own feature transformations, which is cool because it means you don't have to put nearly so much effort into figuring out how to curate the input.
To get an idea of a machine-learning course not focused on deep learning, compare Andrew Ng's other course, the one he teaches at Stanford, which is exclusively focused on other aspects of ML: http://cs229.stanford.edu/schedule.html
Yeah, it would be more accurate to say that Ng's course (as best I remember it) provides a good introduction to machine learning and covers unsupervised learning, for which deep learning is one approach.
Theano has the infrastructure in place for OpenCL, but not all 'operations' are implemented, which (for any decent calculation) means that it's a no-go [1].
Unfortunately, the focus at the Montreal lab that has a huge influence on its development seems to be (a) 'blocks' for a high-level DNN environment (which is very cool) and (b) CUDA-to-the-max (which is understandable, given Nvidia actively seeds research labs with freeby cards, and -- as evidenced by the article -- is putting a lot of effort into supporting deep learning).
rant start:
It's a shame that OpenCL doesn't get more love. Just the other day there was a cool Clojure GPU project (based on OpenCL) announced on HN. One of the comments was 'will you be building this for CUDA too?'. Rather than pressure open source writers to support closed systems, it would be better to pressure Nvidia to provide up-to-date OpenCL drivers. Newer Nvidia cards are at OpenCL 1.2. And the (somewhat old) OpenCL drivers are always there in an Nvidia install. But does Nvidia ever talk about that : No. It's entirely in Nvidia's interest to encourage everyone to talk CUDA-only. But on a GFLOPs/$ basis, and for the cause of Free, CUDA isn't the right way to go.
Disclaimer, it's my project, but I run an open source project called deeplearning4j, who's algorithms have a hardware abstraction layer built in to them called nd4j. You get numpy on the jvm and hardware as a jar file. Deeplearning4j itself is built on top of that. Would love to help spread deep learning to different runtimes.
Need to run empirical benchmarks. CUDA is usually faster. I'd like to run my own benchmarks with nd4j though. We have our own benchmark setup that works for every backend. It allows us to do some interesting things. Cuda itself is usually faster with data transfer latency though[1].
Looking forward to running these ourselves after our opencl support kicks in (only the kernels are written =/)
I plan on basing the work for open cl on our cuda work which is fairly well established at this point (mainly doing optimizations not much change in architecture)
I wouldn't be surprised if this turned out not to be accidental. I mean, it wouldn't work against NVidia for OpenCL to continue to be seen as the "slower option". So I'm sure their efforts to improve their OpenCL implementation aren't considered as important from a business point of view.
True, though if you compare roughly equivalent Nvidia and AMD GPUs, my impression is that the CUDA implementation on Nvidia still outperforms OpenCL on AMD for deep learning. Is this right?