Consider the USB HID descriptor for (say) a QMK keyboard w/ NKRO enabled:
0x05,0x01, // Usage Page: Generic Desktop Ctrls
0x09,0x06, // Usage: Keyboard
0xa1,0x01, // Collection: Application
0x85,0x03, // REPORT ID
// ----------------------------------------------------------------------
0x05,0x07, // Usage Page: Kbrd/Keypad
0x19,0xe0, // Usage Minimum: 0xe0
0x29,0xe7, // Usage Maximum: 0xe7
0x15,0x00, // Logical Minimum: 0
0x25,0x01, // Logical Maximum: 1
0x95,0x08, // Report Count: 8
0x75,0x01, // Report Size: 1
0x81,0x02, // Input: Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position
// ----------------------------------------------------------------------
0x05,0x07, // Usage Page: Kbrd/Keypad
0x19,0x00, // Usage Minimum: 0x00
0x29,0xef, // Usage Maximum: 0xef
0x15,0x00, // Logical Minimum: 0
0x25,0x01, // Logical Maximum: 1
0x95,0xf0, // Report Count: -16
0x75,0x01, // Report Size: 1
0x81,0x02, // Input: Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position
// ----------------------------------------------------------------------
0x05,0x08, // Usage Page: LEDs
0x19,0x01, // Usage Minimum: Num Lock
0x29,0x05, // Usage Maximum: Kana
0x95,0x05, // Report Count: 5
0x75,0x01, // Report Size: 1
0x91,0x02, // Output: Data,Var,Abs,No Wrap,Linear,Preferred State,No Null Position,Non-volatile
0x95,0x01, // Report Count: 1
0x75,0x03, // Report Size: 3
0x91,0x01, // Output: Const,Array,Abs,No Wrap,Linear,Preferred State,No Null Position,Non-volatile
0xc0, // End Collection
recovered empirically on Ubuntu by running:
hexdump -e '16/1 "%02x " "\n"' /sys/bus/hid/devices/0003:FFFF:3000.001F/report_descriptor
where FFFF and 3000 are the VID/PID and the 001F part seems to change often (so you must get it before hand, eg. by running: ls -l /sys/bus/hid/devices).
Notice that (unless I'm counting wrong) this yields 256-bit (32-byte) HID reports, which is a whooping half of what a USB FS Interrupt Transfer even allows (see:
https://beyondlogic.org/usbnutshell/usb4.shtml#Interrupt).
Compare that with a "boot procotol keyboard" HID report, which (I think) is only 8 bytes (ie. 4x smaller).
In my tests, every byte matters over USB Full Speed (or maybe my STM32 is too slow?).
(Eg. sending a 1-byte report can be done *way* faster than sending an 8-byte report.)
This may (or may not) be unimportant for regular typing, but it seems (unless I'm missing something) quite important for automation-oriented things like macros (sending macros as fast as possible), Unicode, emojis, and whatnot. (Imagine if copy-pasting the text inside a 10KB document too 3 seconds! But sending 10,000 HID keystrokes over USB FS at 1000Hz might take seconds.)
Eg. if you want to send the flag of England emoji (U+1F3F4 U+E0067 U+E0062 U+E0065 U+E006E U+E0067 U+E007F), that's seven 20-bit (almost the max size) Unicode codepoints, which in practice occupies 28 bytes in UTF-8, UTF-16 or UTF-32, so your (very practical) macro that spams 100 flags of England now run 7x slower.
So I'm wondering, is NKRO even needed?
Even for steno, if each finger only presses at most 2 keys at a time, that's a maximum of 20 keys? But 20-key chords should be very rare, if any even exist.
And for regular, non-steno, non-chorded typing, theoretically you'd only press 10 keys at a time at most, but in practice more like 4 or 6?
So I always thoght NKRO was this "must have feature" that all mechanical keyboards must have, but now that I'm actually working on one, I'm thinking, why not 16KRO, save 16 (or 15) bytes, and call it a day?
Am I missing something?