For an exact measurement you probably would require a specialized setup, consisting of a modified firmware doing two things: 1.) measures the clock cycles passed between the keystroke was registered and the the clock when the USB transfer has been send to the bus 2.) generating two or more USB transfers with a known delay in between containing their timestamps as payload. On the host side you then would need a dedicated user space driver to evaluate these transfers syncing on the first one, so you can measure the deviation on the following. Contrary to the considerable effort of writing specialized FW and driver, such a setup still would be affected by the kernels scheduling.
Also high-speed cameras wonīt get you too far, as not only their time resolution is restricted to <1000 FPS on most models and normal monitors even more.
The most practical approach coming to my mind would be to capture some of your USB traffic with a tool like Wireshark. USB protocol is always host controlled, following a request-response pattern. So all you have to do is to check your capture file for matching transfers and look up the delay between them. Even if this not exactly the latency it still gives you an idea how long your keyboard takes to respond.
@bondonin: Without knowing the nVidia tool exactly, I would assume they are more interested in the rendering latency, meaning the time it takes to transform some drawing commands sent to the GPU into pixels shown on you monitor. Whatīs going on inside a USB-host controller, itīs kernel mode driver or even inside of a HID usually is beyond the scope and accesibility of such a tool.