Sure. You'd just need to edit the sample code to prevent additional skeletons from being processed. However, you might run into a problem where the boxes are detected FIRST, and then the actual person is ignored. So, you might want to look at the skeletal nodes for some threshold of motion, ignoring a non-moving skeleton, instead. Keep in mind that even static object will appear to have SOME motion, due to the (in)accuracy of the depth sensor on a per-frame basis.
Alternatively, you could look at the depth of the skeleton (distance from the camera), and ignore elements that are outside your "play area". I don't know if this would work for your project, though.