O indeed! Fixed now
Ian, what's the main Corpus you use for your layouts?
And what are your thoughts on the changes I've made?
1. more fixing needed :-)
Error with eNNe AS 83.86.en.json : eNNe AS 83.86, duplicate character on keyboard: ; (59)
Error with eNNe AS 83.86.en.json : eNNe AS 83.86, characters not on keyboard: ^
re the corpus, I did discuss it on Den's site a while back.. . basically it dawned on me that most if not all of the texts we were using were not good... they may be "real world" but their letter frequency does not match English (or Dutch, for that matter). That applies especially to the assorted word lists.
So I went searchabout .. if I found stuff online (news or blogposts) that looked like they might have correct frequency, I cleaned them up and ran them through a checker. Most failed.
Did the same with assorted old books on Gutenberg.
My checker tries to find the texts that most match English letter frequency. I was doing "similarity checks" to the list of characters in descending order, but existing similarity algos didn't work they way I needed them to, so I rolled my own. Ended up with three, slightly different.
Two check all characters, the other just checks the 11 most common letters (as in: your home row and space bar.... or somesuch).
At the moment I only have 8 texts that have the top 10 or 11 characters in the correct order (score 66 or 65 in Top11 column on spreadsheet) . I use this metric because letter distribution in a text is some sort of Zipf distribution curve, and the most used letters are the most important. The FreqMatch column is most important, lower == better. The Similarity column has higher== better.
I uploaded them to Den's site the other day, might as well put it here too. Note that most of them are from a blog by John Ward, it just so happens that he occasionally churns out pieces that have the right frequency distribution. They are probably copyrighted but I guess we can plead some sort of fair use case. Two are old books, and one an op-ed piece from Russia Today. Note I'm not concerned with what they say, just the letter frequency ..... :-)
Texts and analysis attached. The .csv is tab-delimited.
I don't know what to do about "programming" inputs... original idea was a mishmash of languages from RosettaCode. Should probably limit it to "popular" languages, but that's a moving target and the well known lists are heavily criticised. There's still tons of COBOL and Fortran being written/maintained but those programmers don't need to ask Google or StackOverflow for help....
So at the moment I'm just going with these English tests, and sometime KLE home page for "web dev stuff" ... but that overuses certain things like the letter k and doublequotes etc.
As for your layouts, will take a look. At the moment trying to do a round-up of recent new/updated layouts to do another round of testing. So if anyone reading this has a new layout, feel free to submit.
You can check most of what I already have here:
https://www.keyboard-design.com/internet-letter-layout-db.htmlI have a bunch of others to add as well (and probably some others from Den)
May 6 2018 YPHINAFU.txt
May 11 2018 beakl9-ansi-shifted-mod.txt
Dec 10 2017 'eNNe KLA3 P 109.54.json.txt'
Jul 19 2018 phynnboi.txt
Feb 11 2019 quesnel.mtgap.beakl.ansi.txt
May 6 2018 snarfangel-UP_OIANY_4.txt
May 6 2018 spindle.txt
Aug 13 07:50 FinalBestKeyboard07252020.json
Aug 13 07:48 KFOU_DHAI_Y_Kinesis.json
Aug 13 07:31 Qwicker-KFLY.kla.json
Aug 13 07:31 Qwicker-Mod-H.kla.json
Aug 13 07:32 Qwickly-Mod-B.kla.json
Aug 9 22:31 astarte.en.ansi.json
Mar 2 09:05 balanced13-iso-shifted-final-mod-ian.json
Mar 2 07:58 balanced13-iso-shifted-final.json
Mar 2 07:57 balanced13-iso-shifted-no-spacefn.json
Mar 4 10:24 balanced13-iso-shifted-v2-mod-ian.json
Mar 4 09:34 balanced13-iso-shifted-v2.json
May 31 11:47 beakl-15.en.matrix.json
Feb 11 2019 beakl15ModPqIntl3.kla.json
May 31 22:28 beakl19.matrix.json
Jun 5 08:05 beaklArr29k1.matrix.json
Mar 11 2019 code.en.ansi.json
Aug 13 10:47 'eNNe AS 83.86.en.json'
Dec 12 2017 'eNNe KLA3 P 108.27.json'
Dec 10 2017 'eNNe KLA3 P 109.54.json'
Dec 10 2017 'eNNe KLA3 P 110.97.json'
Dec 7 2017 'eNNe KLA3 P 111.69.json'
Aug 12 23:21 hycis.en.ansi.json
Jun 9 15:00 'kla BEAKL-19 ergodox.json'
Jun 9 16:31 'kla X1 Atreus-Ergodox 44-keys.json'
Aug 10 23:13 ntsc.en.matrix.json
Aug 10 16:32 ough.en.ergolinear.json
Aug 11 09:44 ough.en.matrix.json
Aug 11 10:07 ougw.en.matrix.json
Dec 29 2017 power.en.ansi.json
Aug 12 21:33 shz.en.ansi.json
Aug 12 10:14 shz.en.matrix.json
Aug 12 13:38 shz.en.matrix.zkq-dblequote.json
Aug 13 08:42 vflm.en.ansi.json
Aug 11 09:43 wiea.en.matrix.json
Dec 30 2017 zx-1.en.ansi.json
Last present : ANSI (30-keys, not full optimization) version of my most recent layout ... scores well.
Cheers, Ian