Extreme HTML5 Video Interactivity: Sending WebSocket Messages with Popcorn.js

One of our most popular demos at Kaazing is using a Web browser on a smartphone to control a physical toy truck from continents away. The truck has a Raspberry Pi attached to it, connecting to a WebSocket server, and listening to control command messages: drive forward or backward, turn left or right, or turn headlight on and off. You can learn more about the project in our Remote Controlling a Car over the Web. Ingredients: Smartphone, WebSocket, and Raspberry Pi blog post.

The most interesting way to demonstrate the truck is by having a remote person control the truck, and join in over a video conference. Here’s the recording of us doing just this. Fast forward to 4:08 for the truck demo.

Now, there are certain circumstances, when running Skype, or other live video chat apps is not an option. You may be off the grid, or simply not have anybody handy controlling your truck remotely.

To address this challenge, we wanted to create a self-contained environment where the same dialog and experience can be presented, but without all the above mentioned dependencies. To achieve this, we decided to record a video of someone operating the remote control that the presenter could use as the “Skyped-in” portion of the presentation. There are a number of ways we could hack this, for example by pretending to control the car by emulating the controls in the room. How cool would it be to instead have the recorded video actually trigger the remote controls using WebSocket messages? Instead of a real person controlling the car in real time, we could have the video control the car in real time.

First, we recorded the video. In the video, David Witherspoon operates the remote control. (Aside: David is a software engineer at Kaazing who, along with colleague Prashant Khanal, was instrumental in dreaming up and building the truck). David followed the script of the dialog very precisely. Knowing the script was not sufficient, we had to do it with exact timing, as it was specified to be run during the actual demo.

After processing the video, I embedded it in a web page, and overlaid the video with a live video feed of the presenter’s laptop camera. This is an important step to make the experience more realistic; after all, every video conference does this.

Here’s the HTML code:

[code language=”html”]
<head>
<script src="http://popcornjs.org/code/dist/popcorn-complete.min.js"></script>
<script src="http://demo.kaazing.com/lib/client/javascript/StompJms.js"></script>
<script>document.addEventListener( "DOMContentLoaded", function() {doConnect();}, false );</script>
<link rel="stylesheet" href="css/truck.css"</style>
</head>
<body>
<video id="selfVideo" autoplay width="256"></video>
<video id="truckVideo" width="1024">
<!– <source src="videos/PeterTruck.mp4" type="video/mp4"> –>
<source src="http://localhost/videos/DavidTruck.mp4" type="video/mp4">
</video>
</body>
<script src="js/truck.js"></script>
[/code]

And here is the JavaScript code:

[code language=”javascript”]
var errorCallback = function(e) {
console.log(‘Reeeejected!’, e);
};

// Not showing vendor prefixes.
navigator.getUserMedia = navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia ||
navigator.msGetUserMedia;

var video = document.getElementById (‘selfVideo’);

if (navigator.getUserMedia) {
navigator.getUserMedia({audio: false, video: true}, function(stream) {
video.src = window.URL.createObjectURL(stream);
}, errorCallback);
} else {
video.src = ‘somevideo.webm’; // fallback.
}
[/code]

The simple CSS snippet ensures that the presenter’s “self” video overlays the remote person’s (recorded) video.

[code language=”css” gutter=”0″]
#selfVideo
{position:fixed;
top:30px;
left:30px;}
[/code]

Also, browsers are required to prompt for permission before Web apps can start using the built-in camera. First, you have to select Allow, for the “little video” in the top left corner to appear. Here’s what the permission request bar looks like in Chrome:

And here’s the end result:

Whenever my actor friend in the main video uses his remote control, we must trigger a corresponding WebSocket message. The messages are sent by the Web app hosting the video at the exact time when the control is touched in the video. I used popcorn.js, an open source media library, to get the timing right:

Popcorn.js is an HTML5 media framework written in JavaScript for filmmakers, web developers, and anyone who wants to create time-based interactive media on the web. Popcorn.js is part of Mozilla’s Popcorn project.

I created an array with the timing and the messages that needed to be sent. The timing is measured in seconds.

[code language=”javascript” gutter=”0″]
var davidTruckMsgs = [
[33,"frontlight;on"],
[35,"frontlight;off"],
[36,"frontlight;on"],
[38,"frontlight;off"],
[42,"steering;right : thrust;off"],
[43,"steering;left : thrust;off"],
[44,"steering;right: thrust;off"],
[45,"steering;left : thrust;off"],
[48,"steering;off : thrust;forward"],
[50,"steering;off : thrust;backward"],
[51,"steering;off : thrust;forward"],
[52,"steering;off : thrust;backward"],
[86,"steering;left : thrust;forward"]
];
[/code]

Then, we have to schedule the WebSocket messages, as defined in the array specified above. Note: The above array is called davidTruckMsgs, and down below we iterate over the truckMsgs array. As you can see in the completed source code, I have multiple arrays for various videos/actors. Whichever is the one used at the moment is referenced as truckMsgs later on.

[code language=”javascript” gutter=”0″]
for (var truckMsg in truckMsgs) {
var obj = truckMsgs[truckMsg];
pop.cue( obj[0], makeCallback( obj ) );
}
pop.play();
[/code]

The makeCallback function invokes the actual logic sending the WebSocket message. If you’re wondering why this is all needed in the first place, check out this question on Stack Overflow.

[code language=”javascript” gutter=”0″]
function makeCallback(obj) {
return function() {
doSend(session.createTextMessage(obj[1]));
};
}
[/code]

For usability, I added pause/continue functionality whenever the main video is clicked. This gives the presenter more control, allowing him/her to preload the page with the main video paused on it.

[code language=”javascript” gutter=”0″]
vid = document.getElementById (‘truckVideo’);
vid.addEventListener (‘click’, function() {
vid.paused ? vid.play() : vid.pause();
});
[/code]

For the WebSocket communication we used the JMS edition of the Kaazing WebSocket Gateway, allowing us to leverage simple pub/sub messaging concepts. With the help of popcorn.js, from the HTML5 web app we publish WebSocket messages to a so called topic, and whoever is interested in (read: subscribed to) it will receive it.

This way the video is driving the truck, simply by having the WebSocket messages sent out properly timed to the pre-recorded video.

Here’s the end result. Isn’t it awesome?

You can see the entire source code on github.