This is a really cool idea. Reminds me a little of the talking piano art project:
An interesting start would be building a database of the sound waveform of keys. The problematic variable is that the "sound" of a keystroke varies depending on the keycap, the switch, the lube, o-ring. Still, you might be able to find constant qualities in these variables (that a certain switch fills out one part of the sound that a certain keycap lacks) and match them up.
You'd also have to consider plate material, case, & mounting, and how they affect the keyboard as a whole. Those things make a huge difference, they're basically like the body of a violin.
Basically: this sounds really cool, but there are so many variables that you may need to limit your choices and accept limited potential. Building a database of waveforms from components and sorting them by tone, pitch and quality would be a super cool and huge data science project.