A new open-source inference engine, flash-moe, by Daniel Woods, has successfully run a 400B-parameter Large Language Model on an iPhone 17 Pro, a device with just 12GB of RAM. The project leverages ...