One question I have is: can they use cheaper kinds of RAM and still be perfectly usable for large ML models? They could put 4GB of GDDR and 128GB of cheap RAM maybe? I do realize as others are saying, this would be a new kind of card so they will need time to develop it. But would this work?
Not without a redesigned memory controller or one off chip. You'd probably just want the host's memory to be directly accessible over PCIE or something faster like NVLINK. Such solutions already exist just not in the consumer space.