Handling TV User Input in a React App: Part 1

5 min readDec 8, 2021

<TLDR>A React app listens to user input from a remote control as keyboard events. Since an app listens for keyboard input at its top level only (as opposed to each component directly listening to click or touch events), we must create a system to route keyboard presses to the right component in the React tree.</TLDR>

I’ve been building TV apps with React for over 5 years now, first as a UX prototyper at Netflix, and currently as a prototyper at HBO Max.

The biggest difference I’ve found between building TV interfaces vs building desktop or mobile apps is handling user input. Desktop apps use mouse events to navigate. Mobile and tablet devices use touch events.

Users navigate TV interfaces, however, with remote controls. The common buttons available across TV devices are UP, LEFT, RIGHT, DOWN, SELECT, and BACK. When piping these into a React app, these events come through as keyboard events.

Now, the great thing about a mouse or touch event is that it occurs directly on the component that responds to it. If you use a mouse to click on an app’s <SearchBar />, the <SearchBar /> is the first-responder for that event. No other component needs to be involved; the <SearchBar /> receives the event, and the <SearchBar /> can respond with the appropriate logic.

This is not the case for TV apps.

A keypress is not targeted at any specific component. Instead, you have one keypress listener at the top-level of your app. So when a user presses DOWN, what exactly are they trying to press DOWN on?

Are they trying to scroll the homepage? Move through an episode list? Move vertically from one component to another? There’s no out-of-the-box way to know. All we got was a DOWN keypress, not targeted to any specific component.

So, it’s up to us to track which component is in focus at any given time. We can then route keyboard input to that focused element.

The problem is, a production app can have hundreds or even thousands of components in its tree at once. How is your top-level component—we’ll call it <App />—supposed to track which of its thousand descendants is in focus?

My approach is a system where components only track focus of their direct children. Then, those direct children track focus of their own direct children, and so on until you reach the end of your tree.

The benefit of this approach is that <App /> doesn’t have to think about routing keyboard input to its hundreds of descendants. It’s only responsible for knowing about its few direct children.

At a high level, it works like this:

Component A receives a keypress event.
Should Component A delegate this event further?

a) If it has a focused child that can handle this keypress, yes. Pass the keyboard event to that focused child.
b) Otherwise: No more delegation. Component A handles the keyboard event itself.

Let’s look at an example. On the Roku search page, a user presses DOWN to move from the letter O to the letter U:

Your top-level App component receives the DOWN keypress.
It has a focused child (Search) that says it can handle a DOWN keypress.
App delegates the event down to Search.
Search has a focused child (KeyboardGrid) that says it can handle a DOWN keypress.
Search delegates the event down to KeyboardGrid.
KeyboardGrid has a focused child (Letter), but Letter says it can not handle the DOWN keypress.
So, KeyboardGrid handles the DOWN event itself: it updates its focused child from “O” to “U”

A component only needs to know about itself and its direct children. Your App component stays blissfully unaware of how KeyboardGrid should handle a DOWN event.

Let’s dive deeper by building a tiny TV app. You can play with a live version of this demo here.

Our app shows a user 3 genres. Within each genre are 3 movies. And within each movie are 2 CTAs (“calls to action”, also known as buttons).

Our React component tree looks like this:

To build up some intuition, let’s look at a visual example of what happens when a user has Genre #1 focused, then presses RIGHT. Our starting state will look like this: