Since our first post announcing our ETW USB Keylogger from our Ruxcon talk, we've been getting a lot of questions about our work (which we greatly appreciate!). One of the questions we've been asked is how to extract keystrokes from USB data using ETW so we wanted to share some of our notes and disclose some of the "lessons learned" during our analysis and development.
USB Keyboard Basics
If you are new to the USB protocol (like us) the first thing you'll likely notice is USB keyboards (and USB devices in general) are VERBOSE. There will be a huge number of USB "packets" or URBs (USB Request Blocks: https://msdn.microsoft.com/en-us/library/windows/hardware/ff537056(v=vs.85).aspx) coming from a keyboard at a rate defined by the device. This is known as the keyboard's polling rate. It's important to remember that USB keyboards are not interrupt-based but rather they poll for data. Polling frequencies in USB keyboards vary but are usually somewhere around 125Hz which means that every 8ms or so, depending on the device, the keyboard will transmit the keyboard state which equates to 300-400 bytes (when using ETW).
Turning URBs into Keystrokes
Once we understood the basics of how USB keyboards work the next step in our work was to take the data we were seeing from ETW and turn it into something meaningful. Which, at a high level, means picking out the URBs we are interested in, extracting their payload, then mapping the payload via the Human Interface Device (HID) specification to determine which key is pressed. This process is typically handled in the kernel via the USB driver stack and is a fairly involved process, certainly more than we wanted to implement, especially in userland. This is where the TraceEvent library (https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent) really shines. By letting us simply query each incoming URB for the fields we need we can "cheat" when parsing URBs by only parsing the ones we are interested in as opposed to inspecting ever single URB that comes off the USB bus.
To start this process we can use the TraceEvent library to set up our application to act as an Event Tracing Controller so that we can start an ETW session and enable the following providers: Microsoft-Windows-USB-UCX, Microsoft-Windows-USB-USBPORT. Once enabled, the providers will send our application TraceEvent event objects through a callback function that we can then begin parsing the URBs for the data we are interested. For the purpose of keylogging we only care about a single type of URB known as BULK_OR_INTERRUPT_TRANSFER. These URBs contain the USB HID data that represents keystrokes. The only problem is that there are two provider types that handle data for the different USB driver stacks. By parsing the event object and searching for the desired payload names, we can filter for fid_USBPORT_URB_BULK_INTERRUPT_TRANSFER or fid_UCX_URB_BULK_INTERRUPT_TRANSFER, corresponding to USB2.0 and USB3.0, respectively, and can effectively gain access to keystrokes on both the USB 1.0/2.0 and 3.0 buses. Below is an overview of the process.
Though ETW abstracts the details of accessing these structures (a nice added benefits which greatly sped up our research), under the hood these structures are fairly simple and definition can be seen below. For our purposes, keylogging, we are only really interested in the TransferBuffer and the TransferBufferLength fields where the payload data and the payload data size are stored respectively.
To help speed up our processing of keyboard data we apply an additional check in order to limit our logging only to keyboard devices. This is accomplished by checking for an unused reserved byte and the length of the URB's payload buffer, which can be found in the fid_URB_TransferBuffer and fid_URB_TransferBufferLength fields. A keyboard has a transfer buffer length of 8 bytes, as defined in the USB HID specification (http://www.usb.org/developers/hidpage/HID1_11.pdf), and has the following layout:
|0||Modifier keys – Ctrl (1), Shift (2), Alt (4)|
|1||Reserved/unused, must be 0x00|
Additional checks on the payload data can then be added to reduce the number of events we have to process by further filtering:
- Key rollover errors, which results from pressing 7+ keys at the same time.
- Empty keyboard data bytes, which can be from the 8ms polling interval and tends to generate a ton of useless events.
- Strange out-of-order data. Keystrokes are populated from lowest byte array index to highest. We can ignore oddities that do not meet this condition, which might be sourced from non-keyboard devices.
Once all the checks are complete, and we are only capturing keyboard-generated data, we can cherry pick the last 8 bytes of the event object and map the data to its corresponding value according to the HID specification:
As can be seen below in the HID Usage Table (http://www.usb.org/developers/hidpage/Hut1_12v2.pdf), this payload data represents an 'a' on a compliant USB keyboard.
After mapping the data to its corresponding HID value we're finished and we can record the keystrokes!
It should be noted that our solution is merely a "good enough" approach and is far from a full, "to specification" implementation of the steps required to obtain keystrokes from "raw" USB data. That being said, we'd love any feedback, comments, or contributions to improving our approach!
Download our keylogger code (as well as the source for our other demos) from Github:
SRT [at] cyberpointllc.com