There's loads of ways it can happen.
No matter how it's done, the basic idea of debouncing is that you take a glitchy signal and watch it until it stops glitching.  There's a load of ways of doing this, but they all involve somehow detecting the "edges" where a signal goes from being bouncy to being solid and back again.  
The simplest way to do this is to take two simultaneous samples and see if they are the same.  The problem with this is what period to leave between samples.  Sampling theory says that you have to sample at at least twice the frequency to accurately map a waveform, but key bouncing can be anything from a few tens of hertz up to tens of megahertz, and you can't accurately sample a range like that with two samples at any one fixed period.  You'll probably eventually detect keypresses, but you'll almost certainly get a "bouncy" result and miss keypresses if the user doesn't leave the keys pressed for long enough. 
"Okay", says the intelligent developer, "I'll accept that I probably can't 
accurately sample the signal, but if I sample continuously over a period of time, surely probability will mean that I get a more-or-less accurate result if I jigger about with the time period and number of samples", and, indeed, this is the case.  The tradeoffs here are :
- number of samples in a time period - more is better in terms of accurately detecting HF bounce, downside is required memory and processing time which might be tight on a keyboard controller.
- time period - again, longer is better in terms of detecting bounce (cherry reckon 5ms), but longer periods increase keyboard latency and, again, the possibility of missing keypresses.  Considering that we usually sample a column at a time, and a 102 key keyboard might have 18 columns, at 5ms debounce time you could well miss keypresses of under 100ms length, or even longer (see below).
The "fixed time interval" aspect of this approach also increases the possibility of missing keypresses.  Let's assume we're sampling accurately enough to detect all bounciness in the keyswitches, over a period of 5ms.  Now assume that for a single keypress, in sample period 1, the signal goes "clean" after the very first sample - we have one "zero" and a number (perhaps thousands) of "one"s.  The signal is properly considered as "bouncy" and ignored.  We go off and scan our other columns, and the next time round, as the key is still pressed, we start sampling "ones"s.  Now, even if the key *stays* pressed, we've added 90+ms of latency to the keypress.  That's bad enough, but let's now assume that the user lets go of the key *just* as we're about to stop sampling. So we get thousands of "one"s, and a single "zero".  Again, a "bouncy" signal, and we've missed a keypress of 90+ms length.
This fixed time interval also introduces *ordering* issues.  We might have 2 (or more, usually up to 

 keypresses detected on a single column in a single scan / debounce cycle.  But what order should we report them in?  One might have started 90+ms ago, and the other a nanosecond ago, but we have no way of knowing which.  So we report them in "row order" and hope for the best.
In short, better, but not nearly good enough.
Next approach is the "sliding window" approach.  It's similar to the above approach, but rather than sampling continuously for a period, we cycle through the columns taking one sample at a time, pushing the results into something like a push-down queue or a circular buffer (per column, obviously).  Then, as soon as a single "row" reading goes "clean", we report that change of state.  This largely does away with the ordering issues, and, assuming enough samples, most of the missed keys.  Tuning is done by changing the inter-sample time (i.e a delay loop at the end of each sampling cycle) and / or changing the number of samples.
Let's assume a 4 sample window taken over the classic 5ms period, with 4 bits per sample.  "s" is the sample number, 1, 2, 3 & 4 are the 4 bits sampled for this "column". 
s : 1 2 3 4
1 : 0 0 0 0 
2 : 0 0 1 0
3 : 0 0 0 0
4 : 0 0 1 1
5 : 0 0 1 0
6 : 1 0 1 1
7 : 0 0 1 1
8 : 1 0 1 1
9 : 1 0 1 1
a : 1 0 0 1
b : 1 0 0 0
c : 1 0 0 0
d : 0 0 0 0
At s = 0, no keys are pressed.
at s = 1, key 3 is pressed and starts bouncing.
At s = 4, key 3 goes "clean" and key 4 starts bouncing
At s = 6, key 4 goes clean, and key 1 starts bouncing.
at s = 7, key 3 has 4 "clean" samples (s = 4, 5, 6, 7), so we send a "key down" message (5ms latency from first "clean" sample)
at s = 8, key 1 goes clean
at s = 9, key 4 has 4 clean samples (s = 6, 7, 8, 9), so we send a key down message
s = a, key 3 samples 0, no longer has 4 clean samples, we send a key up message
s = b, key 1 has 4 clean samples (s = 8, 9 , a, b), key down message.  key 4 samples 0, key up message
s = d, key 1 samples 0, key up message
You'll note that there is latency to the key down messages, but not to the key up messages.  This can be rectified by requiring n zero samples before changing state, at the cost of possibly missing "quick" key down-up-down movements.
Typically, we'd integrate over 8, 16, or 32 samples, and, of course, 8 rows per column.  More samples = more memory and processing overhead (although the processing is all bitwise ops), less = more chance of missing keys and bounces. 
The above is largely the approach taken by yuri's firmware, but I'm almost certain it's only integrating over 3 samples. Which would probably explain the missing keys issue - 3 samples isn't enough. 
Once you've taken all this into account, I don't think the TE has a full nkro matrix (the firmware appears to be checking for ghosts; that said, teh PCB appears to have diodes for every key - mistake in reversing the firmware, or historical artefact in the firmware - who knows?), so you might have the firmware deliberately ignoring keypresses; combine this with slightly flaky debouncing and that could lead to missed keys as well.