The aim of this paper is to create an interface for human-robot interaction. Specifically, musical performance parameters (i.e. vibrato expression) of the Waseda Flutist Robot No.4 Refined IV (WF-4RIV) are to be manipulated. Our research is focused on enabling the WF-4RIV to interact with human players (musicians) in a natural way. In this paper, as a first approach, a vision processing algorithm, that is able to track the 3D-orientation and position of a musical instrument, was developed. In particular, the robot acquires image data through two cameras attached to its head. Using color histogram matching and a particle filter, the position of the musicianpsilas hands on the instrument are tracked. Analysis of this data determines orientation and location of the instrument. These parameters are mapped to manipulate the musical expression of the WF-4RIV, more specifically sound vibrato and volume values. We present preliminary experiments to determine if the robot may dynamically change musical parameters while interacting with a human player (i.e. vibrato etc.). From the experimental results, we may confirm the feasibility of the interaction during a performance, although further research must be carried out to consider the physical constraints of the flutist robot