StuBS
Details about the PC keyboard

Character and Keycodes

For proper handling of keyboard events, several codes are of importance:

ASCII Code

The "American Standard Code for Information Interchange" (ASCII) is a widely-used mapping between numbers and printable characters (e.g., numbers, alphabetic characters, whitespaces, and several special chars). Formerly, 7 Bits were intended per character, while today 8 Bits (one Byte) are used basically everywhere. The following table shows an excerpt from the ASCII table:

Character ASCII Code
( 40
0 48
1 49
2 50
A 65
B 66
a 97

Commonly, characters and strings are stored in ASCII (or its bigger brother UTF-8, a short intro is given in the section below).

UTF-8 vs. ASCII

UTF-8 is a superset of ASCII, that is, ASCII encoded strings are valid UTF-8 strings. While every ASCII character is made up of exactly one byte, UTF-8 characters have a varying length ranging from 1 to 4 bytes. The following table describes the structure of UTF-8 characters: Single-byte encoded UTF-8 characters always have a leading 0, yielding an encoding of characters identical to their seven bit ASCII encoding. If a character is encoded using mutiple bytes, on the other hand, its first byte encodes the number of total bytes by setting the first n most significant bits to 1, followed by a single 0. All following bytes start with bits 7 and 6 set to 10. This way, it can be easily detected whether the current byte is part of an UTF-8 character.

For further insights into UTF-8, refer to documentation on the internet.

Number
of bytes
Bits for
code point
Byte 1 Byte 2 Byte 3 Byte 4
1 7 0⬚⬚⬚⬚⬚⬚⬚
2 11 110⬚⬚⬚⬚⬚ 10⬚⬚⬚⬚⬚⬚
3 16 1110⬚⬚⬚⬚ 10⬚⬚⬚⬚⬚⬚ 10⬚⬚⬚⬚⬚⬚
4 21 11110⬚⬚⬚ 10⬚⬚⬚⬚⬚⬚ 10⬚⬚⬚⬚⬚⬚ 10⬚⬚⬚⬚⬚⬚

Scancode

Every key on the keyboard is assigned a unique number – a scancode. The scancode enables identification of keys that both do and do not represent printable characters (e.g., arrow keys). Keep in mind, however, that scancodes do not distinguish between uppercase and lowercase letters, as both are reachable using the same key on your keyboard.

Key Scancode
A 30
S 31
D 32
72
80

In the history of PC development, there have been different keyboards with varying amounts and meanings of keys. Especially function and special keys have varying, non-standardized scancodes. As PC keyboards have only a few more than 100 keys, 7 Bits are sufficient for representing all keys on such keyboards.

Make- and Breakcodes

Programs not only need to be able do detect which "ordinary" key was pressed, but also if and which of the shift, control or alt keys were held while the key was pressed. Therefore, the keyboard does not send a single scancode, but one or more makecodes for every key press or breakcodes for every key release. When a key is kept pressed for a certain period of time, the keyboard will send additional, repeated makecodes. For most keys, the makecode is equal to the scancode and the breakcode is equal to the scancode with bit 7 (counting from 0) set. Due to historic reasons, on pressing or releasing some keys, multiple make- and breakcodes are issued. The keyboard driver (implemented as Keyboard::prologue() as part of exercise 3) needs to derive the intended character from the make- and breakcodes received from the keyboard.

Note
As interpreting make- and breakcodes is quite cumbersome, boring, and non-informative, we provide the decoder implementation. However, it is possible that our implementation does not detect all characters present on your keyboard properly, especially special ones such as German umlauts. Given that, you either need to accept a few wrong characters or adopt the tables used in the decoder.

Flow when a key is being pressed

Pressing a key on a PC keyboard connects two crossing wires within the keyboard's scan matrix. From this connection, the keyboard processor (8042 for PC/XT-, 8048 for AT and MF II keyboards) determines the pressed key's location and, from that, the scan code. This scan code is then sent to the PC using a serial connection.

Every PC motherboard houses a PS/2 controller (also known as keyboard controller) that is connected to and communicates with the keyboard using one output and one input port. The PS/2 controller is programmed via control registers that can be read from and written to using in and out instructions.

Port Register Meaning
0x60 (read) output buffer Make/break code from keyboard
0x60 (write) input buffer Commands sent to the keyboard processor (e.g., toggle LEDs)
0x64 (write) control register Commands sent to the PS/2 controller
0x64 (read) status register PS/2 controller's state (e.g., output buffer is full?)

Whenever the keyboard controller writes a byte to its output buffer, the controller signals the availability of data by sending an interrupt request to the CPU. The CPU is then required to read the byte from the output buffer. Once the output buffer is empty again, the controller changes the value in status register to indicate the emptiness of the output buffer. Now, new characters can be received form the keyboard. When using the keyboard in polling mode, bit 0 (HAS_OUTPUT) can be used to check whether there is a character in the output buffer. The other way round, when sending command codes to the keyboard, it is mandatory to wait until the keyboard controller's input buffer is empty (i.e., bit 1 – the INPUT_PENDING bit – is 0) prior to writing new commands.

Since the mouse is also connected to the PS/2 controller, data from both mouse and keyboard end up in the output buffer. To differentiate between the potential sources, bit 5 in the control registers (IS_MOUSE) indicates whether the byte is from the keyboard (0) or the mouse (1).

Bit Mask Name (in StuBS) Meaning
0 0x01 HAS_OUTPUT Set iff the output buffer contains a character to be read (i.e., the buffer is not empty)
1 0x02 INPUT_PENDING Set iff the input buffer contains pending character (i.e., characters not yet fetched by the controller)
5 0x20 IS_MOUSE Set iff the value in the output buffer originates from the mouse

Programming the keyboard processor

The keyboard processor is configured by sending command codes (see the table below) to its input buffer by writing to the data port. A clean solution would then wait for the keyboard to respond by sending an acknowledge byte (0xfa) to its output buffer (ACK). However, it is non-trivial to achieve a fully standard-conforming implementation; especially correctly waiting for an ACK is difficult, as it may be interwoven with or squashed by subsequent key presses. Therefore it is okay to simply ignore the acknowledgment byte.

Out of the 20 commands supported by PS/2 keyboards, we will only use two:

Command Code Name Description
0xed KEYBOARD_SET_LED Enable/Disable keyboard LEDs. Subsequent to the (acknowledged) command code, a second byte needs to be written to control the LEDs' states. The structure is detailed in the tables below.
0xf3 KEYBOARD_SET_SPEED Repeat rate and delay can be modified by setting a second byte (following the command byte). The structure is detailed in the tables below.

The following table contains the structure of the control byte used by KEYBOARD_SET_LED to set keyboard LEDs.

Bits 3-7 Bit 2 Bit 1 Bit 0
Always 0 Caps Lock Num Lock Scroll Lock

The following two tables illustrate the structure of the set_speed byte. The repeat rate is encoded into the bits 0 to 4, the delay in bits 5 and 6. For the repeat rate, higher values indicate a lower repeat rate. More details are provided, for instance, in this article.

Bits 0-4 (hex) Repeat rate (characters per second)
0x00 30
0x02 25
0x04 20
0x08 15
0x0c 10
0x10 7
0x14 5
Bits 5 and 6 (hex) Delay (in seconds)
0x00 0.25
0x01 0.5
0x02 0.75
0x03 1.0

Once the keyboard controller receives a configuration byte, it will respond with an ACK byte.

Literature