(...)
Now, I'm not sure what sample the algorithm used to generate this layout. What I am seeing here is that the sample may not include many people cared enough to give feedback in this thread. This may or may not be a problem (...). What I worry is that the type of people that were used to train the algorithm may not necessarily be the type of people who are willing to try alternative layouts, which is the majority of the people I know .
(...)
Now we are talking.
The huge question any Data Scientist has to constantly ask him or herself is is:
is the dataset biased in any way? That will deeply affect the algorithm, the optimization and the final results, of course! Well, the
data science method leaves us to inquiry:
- What are the possible biases the dataset could have,
- How would these biases affect the final results and to which extent
- If the biases impact the results singnificantly, what kind of correction or restriction to the dataset would it require to yield more credible results
Analizing trends among the subjects we found out that there are three biases our dataset had:
- The great majority of subjects were right-handed
- Most injuries were caused by repetitive hand movement, but a significant portion were caused by accidents
- Many of the subjects had significant nerve, joint, bone or ligament lesions before our research took plate, some of them having records of surgical intervention on the wrist or hands
However, there were some qualities to our dataset we coudl be certain about:
- None of the subjects were children/adolescents who are still developing joints. All subjects were adults with completely formed joints, muscles and bones.
- No subject was under effect of anesthetic or drug that could affect their response
- All of the subjects had access to computers or mobile devices that carry a keyboard-type input, meaning we didn't teach anyone to use a keyboard. This is important because one might suspect we taught them the wrong way to use the keyboard, afffecting how they perceive the peripheral.
From these biases, we inferred that they could affect the datashet (thence the optimization) in the following ways.
Having the majority of data being from right-handed subjects proved not to be so impactful on usability. I did respond detailedly on this in a previous comment:
Thanks for spoon-feeding me the info! I actually took time to read the full post today. First thing--I'm truly sorry for the loss of Tom... It's a great thing you're doing making this board a part of his legacy. I didn't quite grok that the alternative layout option was showing that the two bottom rows could be swapped for arrow keys--and now I feel even sillier knowing that the arrows are part of the Sagittarius name!
I was also wondering about the differences in the stagger between right and left, but I see you've answered that. I suppose that this actually makes the board right-handed? (I'm a righty, so that's fine for me, but I just wanted to check).
It's also too bad that the last thing to be open-sourced will be the optimization algorithm, as that is the part that interests me the most!
This is an interesting question too: is Sagittarius left or right handed?
The optimization algorithm did take into accound a "left-handed or right-handed" parameter. We did take samples and measurements from a wide array of people, so we did have data from both orientations. The problem is that the majority of people (that's statistical, not personal opinion) are right-handed, so we did eventually ask the questions if our data was biased towards right-handed (which it clearly was). Since I was the data scientist I did some experimentation of rolling the algorithm only with left-handed data or even 50/50.
The results were very interesting. What it did show us is that the "left-handed oriented" layout was VERY harmful to right-handed people, that is, the right handed-oriented coordination tends to be more inflexible to orientation changes, whereas the "right-handed-biased" layout we had was not so bad to left-handed people.
Upon concluding this, we took is to Tom's advisor -- big rehabilitation surgeon and researcher -- and he told us that the result is completely understandable since everything in our world tends to be designed/aimed at righties because the big majority is right-handed, so the righties don't have to adapt. The lefties, on the other hand, are more tolerant towards right-handed appliances because they have to train their physiology -- muscles and muscular memory -- to use right-oriented appliances.
Our ultimate decision was to keep the layout with the whole of the data. Upon running the optimization results and the in locu testing -- we did het a dozen people to use the layout -- the left-handed that used the layout told us they were comfortable using the layout as well.
(...)
As for the bias number two -- the fact that some of the subjects were recovering from accident-induced lesions -- we concluded that it didn't make sense for us to make a layout optimized towards that public, simply because the focus of this keyboard is not being a physiotherapy, but offer a more comfortable option to those who have the exact opposite lesion, induced by repetitive hand movements. Hence we removed from the dataset the subjects with those kinds of lesions (accidental).
Finally, as for the third bias -- subjects that already had significant lesions before the research took place -- we considered that the fact they had this history was statistically enriching to us, because it will allow us to train the algorithm to accomodate people with these conditions. We don't know if the user that will type on a Sagittarius has or had previous lesions, and having that influence to the dataset is welcome. So we left those subjects there.
These were our thoughs and method. If you have an opinion on them, please share.
For your comments:
(...)
1. There are clusters of typing patterns.
2. For each cluster, there may be an optimal layout
(...)
You are completely right and we did think of that. The main motivation for statistics is that obviously for every single individual there is an optimal solution, but it's unfeasible to give each and everyone their particular optimal solution. So we adopt criteria and methods to ensure that one or a couple solutions are optimal to everyone in a
degree of certainty.
Those are fancy words, so let us translate that to the problem at hand. The problem is:
knowing that there are different typing patterns, how do we make a commercially viable layout that single-handedly attends to all of those typing patterns? Take for instance the mice market. As you might know, there are three main types of mouse grips: palms, fingertips and claw. This means that a particular mouse might feel obnoxious to you but estupendly comfortable to a friend; you should get a mouse that fits your particular grip, and there are tens if not hundreds of options to choose from. However we can't make two or three different layouts for Sagittarius, as that would probably require three different plates, three different PCBs and three different cases -- that is, three different keyboards altogether. This not only makes for a logistical nightmare but also divides the GB units, making the cases more expensive or even not meeting MOQ. In order to help with this, we tried adding as many layout compatibilities as we could -- ISO option, arrow options, split shift, split spaces, and so on.
On this, if you have a nice idea on how to help us, I'd appreciate a lot.
Another point where you are completely right is:
(...) So, I doubt that there's any scientific study having a big sample of mechanical keyboard hobbyists. (...)
This is indeed correct; our dataset comes from subjects that have little to no familiarity with this community. However, please note that the data group from this community that I have available is close to nonexistant. I would love to do that with people from the hobby, but even with all of these requirements the MKBR community is so ridiculously small that I can safelly assume the data we collected outnumbers the members of our entire Discord server, let alone the ones near me. I did, however, run the "pseudo-placebo" tests on five friends of mine that participate in the community, so I think I can have some validation there.
Also, please bear in mind that these are health-related tests, and in this field any kind of research or development has
really tight requirements to be even considered scientific
at all. Take for instance the COVID19 vaccine reserches in place: the tests and interviews have to be done in a very specific environment, under the guidance of a very specific and highly trained set of professionals
in locu, using very controlled and precise measurements from hundred-thousand-dollar machines, on a strict control group that was watched closely for weeks, with a very specific care on treatment of the data, and using an equally specific method. As a statician you can vouch to the care and ultimate delicacy we have to treat results from the pharmacology/medicine industries and academia, and I can't make all of that work remotely.
Since we are past the point and already have some solid results, I'd love in the future to get
very deep feedback from members and have them help us fine-tune the layout, without the intention of publishing papers or making hard science. But as Sagittarius is today, I intend to leave it as it is -- perhaps with some minimal changes due to the feedback we are receiving or significant changes if the reasoning is sound -- , not only in respect to the gargantuan amounts of effort I put into it but also in respect to the other people that worked on this -- specially Tom.
I apologise if this reply is too long or sounds arrogant in any way, as I tried to chew down and explain my thought train here with very basic concepts. You said you were a graduate-level statician, but I have to reply to everyone here that is reading this, not only those with a technical background.