There is still the same amount of flash memory. If there is a problem someone just needs to write better code ;-) The only thing I would be slightly worried about is the smaller amount of RAM. It could have effect on how large layout definitions you can have loaded at the same time, or how fast you can swap between them. And how long macros you would be able to record. I don't know too much about this endpoint business, but there are 4 on the u2 compared to 6 on the u4. There is one "larger" endpoint on the u4 as well. I don't know if that is needed for the NKEY rollover. Maybe you'll only be able to get 32KEY rollover or something, I have no idea. The normal 6KRO is still there at least =)
I didn't moan about flash, did I?!
The 'larger' endpoint isn't needed for HID stuff.
One endpoint is used up by a required 'control' endpoint, so the comparison is 3 vs 5.
Total buffer memory for USB is a miserly 176 bytes on the '32U2 vs an exhorbitant 832 bytes on the '32U4.
(Endpoint buffer memory is separate from other RAM).
Endpoints can be single or double buffered. Obviously, the latter uses twice the buffer memory, but can be useful.
The biggest endpoints in my code are the debug and config (in and out) endpoints - altogether 384 bytes used.
(These provide useful features, but maybe they only need to be single buffered).
Control endpoint size is set at 32 in my code, but I'm not sure whether it uses twice that, since it's bidirectional.
The NKRO endpoint is 2x32 bytes.
The 6KRO endpoint is 2x8 bytes.
To also fit a mouse endpoint on the '32U4 I'll need to combine my debug and config endpoints, and write my own version of hid_listen - even that chip's causing some pain!
RAM gets used up fairly quickly if any tables are dynamic (as in my code). If they're in flash then it's less of a problem (as in hasu's code). Allow maybe 128 bytes for global variables and stack etc, one byte per key for individual debouncing, buffers for preparing USB reports... if that's all there is then 1k is sufficient.