Character and Keycodes

For proper handling of keyboard events, several codes are of importance:

ASCII Code

The "American Standard Code for Information Interchange" (ASCII) is a widely-used mapping between numbers and printable characters (e.g., numbers, alphabetic characters, whitespaces, and several special chars). Formerly, 7 Bits were intended per character, while today 8 Bits (one Byte) are used basically everywhere. The following table shows an excerpt from the ASCII table:

Character	ASCII Code
`(`	40
`0`	48
`1`	49
`2`	50
`A`	65
`B`	66
`a`	97

Commonly, characters and strings are stored in ASCII (or its bigger brother UTF-8, a short intro is given in the section below).

UTF-8 vs. ASCII

UTF-8 is a superset of ASCII, that is, ASCII encoded strings are valid UTF-8 strings. While every ASCII character is made up of exactly one byte, UTF-8 characters have a varying length ranging from 1 to 4 bytes. The following table describes the structure of UTF-8 characters: Single-byte encoded UTF-8 characters always have a leading 0, yielding an encoding of characters identical to their seven bit ASCII encoding. If a character is encoded using mutiple bytes, on the other hand, its first byte encodes the number of total bytes by setting the first n most significant bits to 1, followed by a single 0. All following bytes start with bits 7 and 6 set to 10. This way, it can be easily detected whether the current byte is part of an UTF-8 character.

For further insights into UTF-8, refer to documentation on the internet.

Number of bytes	Bits for code point	Byte 1	Byte 2	Byte 3	Byte 4
1	7	0⬚⬚⬚⬚⬚⬚⬚
2	11	110⬚⬚⬚⬚⬚	10⬚⬚⬚⬚⬚⬚
3	16	1110⬚⬚⬚⬚	10⬚⬚⬚⬚⬚⬚	10⬚⬚⬚⬚⬚⬚
4	21	11110⬚⬚⬚	10⬚⬚⬚⬚⬚⬚	10⬚⬚⬚⬚⬚⬚	10⬚⬚⬚⬚⬚⬚

Scancode

Every key on the keyboard is assigned a unique number – a scancode. The scancode enables identification of keys that both do and do not represent printable characters (e.g., arrow keys). Keep in mind, however, that scancodes do not distinguish between uppercase and lowercase letters, as both are reachable using the same key on your keyboard.

Key	Scancode
A	30
S	31
D	32
⬆	72
⬇	80

In the history of PC development, there have been different keyboards with varying amounts and meanings of keys. Especially function and special keys have varying, non-standardized scancodes. As PC keyboards have only a few more than 100 keys, 7 Bits are sufficient for representing all keys on such keyboards.

Make- and Breakcodes

Programs not only need to be able do detect which "ordinary" key was pressed, but also if and which of the shift, control or alt keys were held while the key was pressed. Therefore, the keyboard does not send a single scancode, but one or more makecodes for every key press or breakcodes for every key release. When a key is kept pressed for a certain period of time, the keyboard will send additional, repeated makecodes. For most keys, the makecode is equal to the scancode and the breakcode is equal to the scancode with bit 7 (counting from 0) set. Due to historic reasons, on pressing or releasing some keys, multiple make- and breakcodes are issued. The keyboard driver (implemented as Keyboard::prologue() as part of exercise 3) needs to derive the intended character from the make- and breakcodes received from the keyboard.

Note: As interpreting make- and breakcodes is quite cumbersome, boring, and non-informative, we provide the decoder implementation. However, it is possible that our implementation does not detect all characters present on your keyboard properly, especially special ones such as German umlauts. Given that, you either need to accept a few wrong characters or adopt the tables used in the decoder.

Flow when a key is being pressed

Pressing a key on a PC keyboard connects two crossing wires within the keyboard's scan matrix. From this connection, the keyboard processor (8042 for PC/XT-, 8048 for AT and MF II keyboards) determines the pressed key's location and, from that, the scan code. This scan code is then sent to the PC using a serial connection.

Every PC motherboard houses a PS/2 controller (also known as keyboard controller) that is connected to and communicates with the keyboard using one output and one input port. The PS/2 controller is programmed via control registers that can be read from and written to using in and out instructions.

Port	Register	Meaning
`0x60` (read)	output buffer	Make/break code from keyboard
`0x60` (write)	input buffer	Commands sent to the keyboard processor (e.g., toggle LEDs)
`0x64` (write)	control register	Commands sent to the PS/2 controller
`0x64` (read)	status register	PS/2 controller's state (e.g., output buffer is full?)

Whenever the keyboard controller writes a byte to its output buffer, the controller signals the availability of data by sending an interrupt request to the CPU. The CPU is then required to read the byte from the output buffer. Once the output buffer is empty again, the controller changes the value in status register to indicate the emptiness of the output buffer. Now, new characters can be received form the keyboard. When using the keyboard in polling mode, bit 0 (HAS_OUTPUT) can be used to check whether there is a character in the output buffer. The other way round, when sending command codes to the keyboard, it is mandatory to wait until the keyboard controller's input buffer is empty (i.e., bit 1 – the INPUT_PENDING bit – is 0) prior to writing new commands.

Since the mouse is also connected to the PS/2 controller, data from both mouse and keyboard end up in the output buffer. To differentiate between the potential sources, bit 5 in the control registers (IS_MOUSE) indicates whether the byte is from the keyboard (0) or the mouse (1).

Bit	Mask	Name (in StuBS)	Meaning
0	`0x01`	HAS_OUTPUT	Set iff the output buffer contains a character to be read (i.e., the buffer is not empty)
1	`0x02`	INPUT_PENDING	Set iff the input buffer contains pending character (i.e., characters not yet fetched by the controller)
5	`0x20`	IS_MOUSE	Set iff the value in the output buffer originates from the mouse

Programming the keyboard processor

The keyboard processor is configured by sending command codes (see the table below) to its input buffer by writing to the data port. A clean solution would then wait for the keyboard to respond by sending an acknowledge byte (0xfa) to its output buffer (ACK). However, it is non-trivial to achieve a fully standard-conforming implementation; especially correctly waiting for an ACK is difficult, as it may be interwoven with or squashed by subsequent key presses. Therefore it is okay to simply ignore the acknowledgment byte.

Out of the 20 commands supported by PS/2 keyboards, we will only use two:

Command Code	Name	Description
`0xed`	KEYBOARD_SET_LED	Enable/Disable keyboard LEDs. Subsequent to the (acknowledged) command code, a second byte needs to be written to control the LEDs' states. The structure is detailed in the tables below.
`0xf3`	KEYBOARD_SET_SPEED	Repeat rate and delay can be modified by setting a second byte (following the command byte). The structure is detailed in the tables below.

The following table contains the structure of the control byte used by KEYBOARD_SET_LED to set keyboard LEDs.

Bits 3-7	Bit 2	Bit 1	Bit 0
Always 0	Caps Lock	Num Lock	Scroll Lock

The following two tables illustrate the structure of the set_speed byte. The repeat rate is encoded into the bits 0 to 4, the delay in bits 5 and 6. For the repeat rate, higher values indicate a lower repeat rate. More details are provided, for instance, in this article.

Bits 0-4 (hex)	Repeat rate (characters per second)
`0x00`	30
`0x02`	25
`0x04`	20
`0x08`	15
`0x0c`	10
`0x10`	7
`0x14`	5

Bits 5 and 6 (hex)	Delay (in seconds)
`0x00`	0.25
`0x01`	0.5
`0x02`	0.75
`0x03`	1.0

Once the keyboard controller receives a configuration byte, it will respond with an ACK byte.

Literature

Messmer, Hans Peter: PC-Hardwarebuch - Aufbau, Funktionsweise, Programmierung. Addison-Wesley 1994
UTF-8
The AT keyboard controller
The PS/2 Mouse/Keyboard Protocol
The PS/2 Keyboard Interface
American Standard Code for Information Interchange