Looks like I'm the next in line to pick up this torch and try to carry it forward!
I'm just getting my head wrapped around QMK, but it seems that the critical thing for dual-role keys is that they can only be properly disambiguated upon the next key UP event (and that key up's corresponding key down must have started at or after the dual-role key).
for example, if key0 is dual-role as SHIFT and 'a', and key1 is just a normal 'b' key, here are some sequences and how they should be interpreted:
key0 down, key0 up -> a down, a up
key0 down, key1 down, key0 up, key1 up -> a down, b down, a up, b up
key0 down, key1 down, key1 up, key0 up -> SHIFT down, b down, b up, SHIFT up
because we have process_record_user(), I think we can pull this off, because we can completely override what QMK does with keystrokes. We just need to buffer ambiguous keys until the next up stroke disambiguates them.
For a user with 10 fingers, the most we should need to buffer before we can start disambiguating is 10 downstrokes and 1 upstroke. That shouldn't be a problem as far as RAM consumption. I've gotten started throwing together a little ring buffer to accomplish this.
I'll post more as I actually get something closer to working.