Visual servoing, also known as vision-based robot control and abbreviated VS, is a technique which uses feedback information extracted from a vision sensor (visual feedback[1]) to control the motion of a robot. One of the earliest papers that talks about visual servoing was from the SRI International Labs in 1979.[2]
There are two fundamental configurations of the robot end-effector (hand) and the camera:
Visual Servoing control techniques are broadly classified into the following types:[3] [4]
IBVS was proposed by Weiss and Sanderson.[5] The control law is based on the error between current and desired features on the image plane, and does not involve any estimate of the pose of the target. The features may be the coordinates of visual features, lines or moments of regions. IBVS has difficulties[6] with motions very large rotations, which has come to be called camera retreat.
PBVS is a model-based technique (with a single camera). This is because the pose of the object of interest is estimated with respect to the camera and then a command is issued to the robot controller, which in turn controls the robot. In this case the image features are extracted as well, but are additionally used to estimate 3D information (pose of the object in Cartesian space), hence it is servoing in 3D.
Hybrid approaches use some combination of the 2D and 3D servoing. There have been a few different approaches to hybrid servoing
The following description of the prior work is divided into 3 parts
Visual servo systems, also called servoing, have been around since the early 1980s,[8] although the term visual servo itself was only coined in 1987.[9] [3] [4] Visual Servoing is, in essence, a method for robot control where the sensor used is a camera (visual sensor). Servoing consists primarily of two techniques,[4] one involves using information from the image to directly control the degrees of freedom (DOF) of the robot, thus referred to as Image Based Visual Servoing (IBVS).While the other involves the geometric interpretation of the information extracted from the camera, such as estimating the pose of the target and parameters of the camera (assuming some basic model of the target is known). Other servoing classifications exist based on the variations in each component of a servoing system,[3] e.g. the location of the camera, the two kinds are eye-in-hand and hand–eye configurations. Based on the control loop, the two kinds are end-point-open-loop and end-point-closed-loop. Based on whether the control is applied to the joints (or DOF)directly or as a position command to a robot controller the two types aredirect servoing and dynamic look-and-move.Being one of the earliest works [10] the authors proposed a hierarchicalvisual servo scheme applied to image-based servoing. The technique relies onthe assumption that a good set of features can be extracted from the objectof interest (e.g. edges, corners and centroids) and used as a partial modelalong with global models of the scene and robot. The control strategy isapplied to a simulation of a two and three DOF robot arm.
Feddema et al.[11] introduced the idea of generating task trajectorywith respect to the feature velocity. This is to ensure that the sensors arenot rendered ineffective (stopping the feedback) for any the robot motions.The authors assume that the objects are known a priori (e.g. CAD model)and all the features can be extracted from the object.The work by Espiau et al.[12] discusses some of the basic questions invisual servoing. The discussions concentrate on modeling of the interactionmatrix, camera, visual features (points, lines, etc..).In [13] an adaptive servoing system was proposed with a look-and-moveservoing architecture. The method used optical flow along with SSD toprovide a confidence metric and a stochastic controller with Kalman filteringfor the control scheme. The system assumes (in the examples) that the planeof the camera and the plane of the features are parallel.,[14] discusses an approach of velocity control using the Jacobian relationship s˙ = Jv˙ . In addition the author uses Kalman filtering, assuming thatthe extracted position of the target have inherent errors (sensor errors). Amodel of the target velocity is developed and used as a feed-forward inputin the control loop. Also, mentions the importance of looking into kinematicdiscrepancy, dynamic effects, repeatability, settling time oscillations and lagin response.
Corke [15] poses a set of very critical questions on visual servoing and triesto elaborate on their implications. The paper primarily focuses the dynamicsof visual servoing. The author tries to address problems like lag and stability,while also talking about feed-forward paths in the control loop. The paperalso, tries to seek justification for trajectory generation, methodology of axiscontrol and development of performance metrics.
Chaumette in [16] provides good insight into the two major problems withIBVS. One, servoing to a local minima and second, reaching a Jacobian singularity. The author show that image points alone do not make good featuresdue to the occurrence of singularities. The paper continues, by discussing thepossible additional checks to prevent singularities namely, condition numbersof J_s and Jˆ+_s, to check the null space of ˆ J_s and J^T_s . One main point thatthe author highlights is the relation between local minima and unrealizableimage feature motions.
Over the years many hybrid techniques have been developed.[9] Theseinvolve computing partial/complete pose from Epipolar Geometry using multiple views or multiple cameras. The values are obtained by direct estimation or through a learning or a statistical scheme. While others have useda switching approach that changes between image-based and position-basedbased on a Lyapnov function.[9] The early hybrid techniques that used a combination of image-based andpose-based (2D and 3D information) approaches for servoing required eithera full or partial model of the object in order to extract the pose informationand used a variety of techniques to extract the motion information from theimage.[17] used an affine motion model from the image motion in additionto a rough polyhedral CAD model to extract the object pose with respect tothe camera to be able to servo onto the object (on the lines of PBVS).
2-1/2-D visual servoing developed by Malis et al.[18] is a well known technique that breaks down the information required for servoing into an organized fashion which decouples rotations and translations. The papersassume that the desired pose is known a priori. The rotational information isobtained from partial pose estimation, a homography, (essentially 3D information) giving an axis of rotation and the angle (by computing the eigenvalues and eigenvectors of the homography). The translational information isobtained from the image directly by tracking a set of feature points. The onlyconditions being that the feature points being tracked never leave the field ofview and that a depth estimate be predetermined by some off-line technique.2-1/2-D servoing has been shown to be more stable than the techniques thatpreceded it. Another interesting observation with this formulation is thatthe authors claim that the visual Jacobian will have no singularities duringthe motions.The hybrid technique developed by Corke and Hutchinson,[19] [20] popularly called portioned approach partitions the visual (or image) Jacobian intomotions (both rotations and translations) relating X and Y axes and motions related to the Z axis.[20] outlines the technique, to break out columnsof the visual Jacobian that correspond to the Z axis translation and rotation(namely, the third and sixth columns). The partitioned approach is shown tohandle the Chaumette Conundrum discussed in.[21] This technique requiresa good depth estimate in order to function properly.[22] outlines a hybrid approach where the servoing task is split into two,namely main and secondary. The main task is keep the features of interest within the field of view. While the secondary task is to mark a fixationpoint and use it as a reference to bring the camera to the desired pose. Thetechnique does need a depth estimate from an off-line procedure. The paperdiscusses two examples for which depth estimates are obtained from robotodometry and by assuming that all features are on a plane. The secondarytask is achieved by using the notion of parallax. The features that are trackedare chosen by an initialization performed on the first frame, which are typically points.[23] carries out a discussion on two aspects of visual servoing, featuremodeling and model-based tracking. Primary assumption made is that the3D model of the object is available. The authors highlights the notion thatideal features should be chosen such that the DOF of motion can be decoupledby linear relation. The authors also introduce an estimate of the targetvelocity into the interaction matrix to improve tracking performance. Theresults are compared to well known servoing techniques even when occlusionsoccur.
This section discusses the work done in the field of visual servoing. We tryto track the various techniques in the use of features. Most of the workhas used image points as visual features. The formulation of the interactionmatrix in [3] assumes points in the image are used to represent the target.There has some body of work that deviates from the use of points and usefeature regions, lines, image moments and moment invariants.[24] In,[25] the authors discuss an affine based tracking of image features.The image features are chosen based on a discrepancy measure, which isbased on the deformation that the features undergo. The features used weretexture patches. One of key points of the paper was that it highlighted theneed to look at features for improving visual servoing.In [26] the authors look into choice of image features (the same questionwas also discussed in [3] in the context of tracking). The effect of the choiceof image features on the control law is discussed with respect to just thedepth axis. Authors consider the distance between feature points and thearea of an object as features. These features are used in the control law withslightly different forms to highlight the effects on performance. It was notedthat better performance was achieved when the servo error was proportionalto the change in depth axis.[27] provides one of the early discussions of the use of moments. Theauthors provide a new formulation of the interaction matrix using the velocityof the moments in the image, albeit complicated. Even though the momentsare used, the moments are of the small change in the location of contourpoints with the use of Green’s theorem. The paper also tries to determinethe set of features (on a plane) to for a 6 DOF robot.In [28] discusses the use of image moments to formulate the visual Jacobian.This formulation allows for decoupling of the DOF based on type of momentschosen. The simple case of this formulation is notionally similar to the 2-1/2-D servoing.[28] The time variation of the moments (m˙ij) are determined usingthe motion between two images and Greens Theorem. The relation betweenm˙ij and the velocity screw (v) is given as m˙_ij = L_m_ij v. This techniqueavoids camera calibration by assuming that the objects are planar and usinga depth estimate. The technique works well in the planar case but tends tobe complicated in the general case. The basic idea is based on the work in [4]Moment Invariants have been used in.[29] The key idea being to findthe feature vector that decouples all the DOF of motion. Some observationsmade were that centralized moments are invariant for 2D translations. Acomplicated polynomial form is developed for 2D rotations. The techniquefollows teaching-by-showing, hence requiring the values of desired depth andarea of object (assuming that the plane of camera and object are parallel,and the object is planar). Other parts of the feature vector are invariantsR3, R4. The authors claim that occlusions can be handled.[30] and [31] build on the work described in.[27] [29] [30] The major differ-ence being that the authors use a technique similar to,[14] where the task isbroken into two (in the case where the features are not parallel to the cam-era plane). A virtual rotation is performed to bring the featured parallel tothe camera plane.[32] consolidates the work done by the authors on imagemoments.
Espiau in [33] showed from purely experimental work that image based visual servoing (IBVS)is robust to calibration errors. The author used a camera with no explicitcalibration along with point matching and without pose estimation. Thepaper looks at the effect of errors and uncertainty on the terms in the interaction matrix from an experimental approach. The targets used were pointsand were assumed to be planar.
A similar study was done in [34] where theauthors carry out experimental evaluation of a few uncalibrated visual servosystems that were popular in the 90’s. The major outcome was the experimental evidence of the effectiveness of visual servo control over conventionalcontrol methods.Kyrki et al.[35] analyze servoing errors for position based and 2-1/2-Dvisual servoing. The technique involves determining the error in extractingimage position and propagating it to pose estimation and servoing control.Points from the image are mapped to points in the world a priori to obtain a mapping (which is basically the homography, although not explicitly statedin the paper). This mapping is broken down to pure rotations and translations. Pose estimation is performed using standard technique from ComputerVision. Pixel errors are transformed to the pose. These are propagating tothe controller. An observation from the analysis shows that errors in theimage plane are proportional to the depth and error in the depth-axis isproportional to square of depth.Measurement errors in visual servoing have been looked into extensively.Most error functions relate to two aspects of visual servoing. One beingsteady state error (once servoed) and two on the stability of the controlloop. Other servoing errors that have been of interest are those that arisefrom pose estimation and camera calibration. In,[36] the authors extend thework done in [37] by considering global stability in the presence of intrinsicand extrinsic calibration errors.[38] provides an approach to bound the taskfunction tracking error. In,[39] the authors use teaching-by-showing visualservoing technique. Where the desired pose is known a priori and the robotis moved from a given pose. The main aim of the paper is to determine theupper bound on the positioning error due to image noise using a convex-optimization technique.[40] provides a discussion on stability analysis with respect the uncertaintyin depth estimates. The authors conclude the paper with the observation thatfor unknown target geometry a more accurate depth estimate is required inorder to limit the error.Many of the visual servoing techniques [19] [20] [41] implicitly assume thatonly one object is present in the image and the relevant feature for trackingalong with the area of the object are available. Most techniques require eithera partial pose estimate or a precise depth estimate of the current and desiredpose.