Adding Maxwell compatibility could possibly be achieved by modifying the fp32 blocks in the SM to add full Maxwell compatibility.
This sort of solution would only need to be applied to 4SM, for the 256 Maxwell shaders. Thing about Ampere is that it has more fp32 shaders than it can use at one time, this is why Turing is so much faster per flop, because theoretical flops are rarely achieved in Ampere, where as Turing has half the fp32 shaders per SM.
Given this type of solution, even 6SM with 256 shaders locked away under Maxwell compatibility (which we don't know if they would be unable to be used). would result in 1.5TFLOPs with efficiency closer to Turing, perhaps even better per flop performance. Now this would be a slight downgrade to having all the shaders be Ampere, but it could be a solution to Backwards compatibility.
If this is an 8SM part, you wouldn't be able to run the GPU at 1.5GHz normally without a very high power envelope, this would be 3TFLOPs btw, but with this solution where 4SM only have half the shaders available, it would be 2.3TFLOPs Ampere, now you could likely run at 1.5GHz, because 256 of the 1024 shaders would never have a load, and would give you similar performance to 3TFLOPs, because Ampere would be closer to theoretically hitting its flop capabilities, with 25% of them unavailable.
Another solution would be just to add an emulation layer to the shader cores, this is basically hardware emulation, compatibility given Nvidia's architecture compatibility, is likely to be very high, but emulation is rarely perfect, and there could be games that have graphical errors or don't run without a patch.
I just wanted to give some ideas, you don't need to shove a whole Maxwell GPU in Dane to get backwards compatibility, this is not the 90s, these are programmable shaders that can manage a lot more than old fixed function hardware, and the key here is Nvidia knows their architectures and what would be required to make a seemless solution for it.