Emotion perception and interpretation is one of the key desired capabilities of assistive robots, which could largely enhance the quality and naturalness in human-robot interaction. According to psychological studies, bodily communication has an important role in human social behaviours. However, it is very challenging to model such affective bodily expressions, especially in a naturalistic setting, considering the variety of expressive patterns, as well as the difficulty of acquiring reliable data. In this paper, we investigate the spontaneous dimensional emotion prediction problem in a child-robot interaction scenario. The paper presents emotion elicitation, data acquisition, 3D skeletal representation, feature design and machine learning algorithms. Experimental results have shown good predictive performance on the variation trends of emotional dimensions, especially the arousal dimension.