Yes, it has been done with ESP8266, it can produce shitty audio by itself and when the sender is close to the device with no interference - otherwise you need a dedicated DAC and it can be used as a simple webradio with mono audio output device.
With this ESP32, my bet is it can handle stereo without any other dedicated components and fluently.
Yeah I built the ESP8266 version with SRAM and I2S codec, works well. The ESP32 might have enough internal RAM to not need the external SPI RAM, and the hardware pwms might be usable for audio out (or do we have a usable dac now?).
With this ESP32, my bet is it can handle stereo without any other dedicated components and fluently.
Read more here https://github.com/espressif/ESP8266_MP3_DECODER