Hey there! I'm the lead on multi-device support right now.
Broadly speaking, the processing chain for the Leap is broken up into four steps: Image acquisition, image processing, geometry reconstruction, and then tracking. For multi-device support to work, we intend to run the first three stages in parallel--it's only when we get to Tracking that a question arises about what should be done with the simultaneous inputs.
Internally, our services uses a context-based dependency injection framework as the higher level architecture. The benefit of this approach is that we can insert filter layers between the components at runtime in order to alter their behavior, which means that the approach we choose for implementing multi-device support doesn't lock us in to any specific architecture decisions. As you intuited, it's a lot simpler to assume multiple devices with non-overlapping fields of view than it is to attempt to perform any kind of integration.
The problem, however, doesn't necessarily lie with how we intend to operate the Tracking module. Instead, it's more focused on the API that we're providing the to you, the developer. Once we add a second device, we invalidate the implicit assumption that a single frame equals a single user or that the objects in that frame are globally unique.
We could, of course, generalize our API. We could provide a secondary interface that allows you to enumerate the set of available devices and then you create a controller object for those devices--but the complexity of this gets to runaway levels awfully fast. How do we tell you when a device is detached? What response should you make when this happens? Does a controller abstraction make sense anymore when it's bound to a specific device? What should be done in the event of reattachment of a previously identified device? It's not that these questions don't have answers; rather, it's that the answers involve invalidating prior assumptions about how those interfaces are supposed to work.
So, if we want to preserve existing interfaces, we have a few options about how to proceed:
The most obvious and easiest is Strict Redundant Nonoverlapping. This means that the Leaps are used in settings where their fields of view strictly do not overlap, but we treat each as a redundant source of user input. This means that a hand positioned squarely over one Leap will generate the same API output as a hand positioned squarely over the other. For testing this is fine, but obviously this is going to yield a really bad user experience if the devices are almost nonoverlapping rather than strictly nonoverlapping.
The next option is Fixed Spatial Relationship. The user (or a tool, or an OEM, etc) is responsible for configuring in the spatial relationship between two devices directly into a configuration file, and then we read the difference and apply a correction between the reconstruction and tracking stages. Assuming that the spatial relationship is very precisely defined and doesn't change while the devices are in use, this is a pretty easy way to use the devices to achieve the aim of extending the field of view. Unfortunately, the relationship is going to have to be very precisely defined, and practically would require some kind of custom-made jig to hold the devices, or the user would have to epoxy them to some surface and then run calibration in a separate step. Again, useful for testing, not so much for anything but niche applications.
The final option, and the one we ultimately want to achieve, is Dynamic Spatial Relationship. In this model, overlapping fields of view are detected dynamically at runtime by a Recombinator stage, and then we update the relative offsets on a frame-by-frame basis. This is perhaps the only implementation that would actually satisfy the requirements of a viable product, but each of the previously mentioned stages are all waypoints that get us closer to this final objective. I expect that we will implement each in the order listed, though whether or not we release each mode or support them as a public setting will depend on timelines and community desire.