Geometric feature learning explained

Geometric feature learning is a technique combining machine learning and computer vision to solve visual tasks. The main goal of this method is to find a set of representative features of geometric form to represent an object by collecting geometric features from images and learning them using efficient machine learning methods. Humans solve visual tasks and can give fast response to the environment by extracting perceptual information from what they see. Researchers simulate humans' ability of recognizing objects to solve computer vision problems. For example, M. Mata et al.(2002) [1] applied feature learning techniques to the mobile robot navigation tasks in order to avoid obstacles. They used genetic algorithms for learning features and recognizing objects (figures). Geometric feature learning methods can not only solve recognition problems but also predict subsequent actions by analyzing a set of sequential input sensory images, usually some extracting features of images. Through learning, some hypothesis of the next action are given and according to the probability of each hypothesis give a most probable action. This technique is widely used in the area of artificial intelligence.

Introduction

Geometric feature learning methods extract distinctive geometric features from images. Geometric features are features of objects constructed by a set of geometric elements like points, lines, curves or surfaces. These features can be corner features, edge features, Blobs, Ridges, salient points image texture and so on, which can be detected by feature detection methods.

Geometric features

Primitive features
Compound features[3]

Geometric component feature is a combination of several primitive features and it always consists more than 2 primitive features like edges, corners or blobs. Extracting geometric feature vector at location x can be computed according to the reference point, which is shown below:

style xi=xi-1+\sigmai-1di\begin{bmatrix} \cos(\thetai-1+\phii)\ \sin(\thetai-1+\phii) \end{bmatrix}

style \thetai=\thetai-1+\Delta\thetai

style \sigmai=\sigmai-1\Delta\sigmai

x means the location of the location of features,

style\theta

means the orientation,

style\sigma

means the intrinsic scale.

Boolean compound feature consists of two sub-features which can be primitive features or compound features. There are two type of boolean features: conjunctive feature whose value is the product of two sub-features and disjunctive features whose value is the maximum of the two sub-features.

Feature space

Feature space was firstly considered in computer vision area by Segen.[4] He used multilevel graph to represent the geometric relations of local features.

Learning algorithms

There are many learning algorithms which can be applied to learn to find distinctive features of objects in an image. Learning can be incremental, meaning that the object classes can be added at any time.

Geometric feature extraction methods

Feature learning algorithm

1.Acquire a new training image "I".

2.According to the recognition algorithm, evaluate the result. If the result is true, new object classes are recognised.

The key point of recognition algorithm is to find the most distinctive features among all features of all classes. So using below equation to maximise the feature

style fmax

style Imax=\underset{f}{max}\underset{C}{max}I(C,Ff)

style I(C,Ff)=-\underset{C}{\sum}\underset{Ff

}BEL(F_,C)\log \fracMeasure the value of a feature in images,

style fmax

and
style f
fmax
, and localise a feature:
style f
f(p)

(I)=\underset{x\in

I}{max}f
f(p)

(x)

Where
stylef
f(p)

(x)

is defined as
stylef
f(p)

(I)=max\left\{0,

f(p)T)f(x)
\left\|f(p)\right\|\left\|f(x)\right\|

\right\}

After recognise the features, the results should be evaluated to determine whether the classes can be recognised, There are five evaluation categories of recognition results: correct, wrong, ambiguous, confused and ignorant. When the evaluation is correct, add a new training image and train it. If the recognition failed, the feature nodes should be maximise their distinctive power which is defined by the Kolmogorov-Smirno distance (KSD).

styleKSDa,b(X)=\underset{\alpha}{max}\left|cdf(\alpha|a)-cdf(\alpha|b)\right|

3.Feature learning algorithmAfter a feature is recognised, it should be applied to Bayesian network to recognise the image, using the feature learning algorithm to test.

PAC model based feature learning algorithm

Learning framework

The probably approximately correct (PAC) model was applied by D. Roth (2002) to solve computer vision problem by developing a distribution-free learning theory based on this model.[5] This theory heavily relied on the development of feature-efficient learning approach. The goal of this algorithm is to learn an object represented by some geometric features in an image. The input is a feature vector and the output is 1 which means successfully detect the object or 0 otherwise. The main point of this learning approach is collecting representative elements which can represent the object through a function and testing by recognising an object from image to find the representation with high probability. The learning algorithm aims to predict whether the learned target concept

stylefT(X)

belongs to a class, where X is the instance space consists with parameters and then test whether the prediction is correct.

Evaluation framework

After learning features, there should be some evaluation algorithms to evaluate the learning algorithms. D. Roth applied two learning algorithms:

1.Sparse Network of Winnows(SNoW) system

styleFt=\phi

which linked to target t for all

stylet\inT

. T is a set of object targets whose elements are

stylet1

to

styletk

style\underset{i\ine}{\sum

t
}w
i
with

style\thetat

, where
t
stylew
i

is the weight on one position connecting the features i to target t. \theta_ is the threshold for the target not t.

style\underset{i\ine}{\sum

t
}w
i

>\thetat

and targets are not in the list of active features) and predicted negative on positive example(

style\underset{i\ine}{\sum

t
}w
i

\leq\thetat

and targets are in the list of active features).
2. support vector machinesThe main purpose of SVM is to find a hyperplane to separate the set of samples

style(xi,yi)

where

stylexi

is an input vector which is a selection of features

stylex\inRN

and

styleyi

is the label of

stylexi

. The hyperplane has the following form:

stylef(x)=sgn\left(

l
\sum
i=1

yi\alphaik(x,xi)+b\right)=\left\{\begin{matrix} 1,positiveinputs\ -1,negativeinputs \end{matrix}\right.

stylek(x,xi)=\phi(x)\phi(xi)

is a kernel function

Both algorithms separate training data by finding a linear function.

Applications

References

  1. M. Mata and J. M. Armingol and A. De La Escalera and M. A. Salichs, "Learning visual landmarks for mobile robot navigation", In Proceedings of the 15th World congress of the International Federation of Automatic Control, 2002
  2. Cho, K., and Dunn, S.M "Learning shape classes". IEEE Transactions on Pattern Analysis and Machine Intelligence 16,9(1994), 882-888
  3. Justus H Piater, "Visual feature learning" (January 1, 2001). Electronic Doctoral Dissertations for UMass Amherst. Paper AAI3000331.
  4. Segen, J., Learning graph models of shape. In Proceedings of the 5th International Conference on Machine Learning(Ann Arbor, June 12–14, 1988), J. Larid, Ed., Morgan Kaufmann
  5. D. Roth, M-H. Yang, and N. Ahuja. Learning to recognise three-dimensional objects. Neural Computation, 14(5): 1071–1104, 2002.
  6. M. Mata, J. M. Armingol, Learning Visual Landmarks for Mobile Robot Navigation, Division of Systems Engineering and Automation, Madrid, Spain, 2002
  7. I. A. Rybak, BMV: Behavioral Model of Visual Perception and Recognition, Human Vision, Visual Processing, and Digital Display IV
  8. P. Fitzpatrick, G. Metta, L. Natale, S. Rao, and G. Sandini, “Learning About Objects Through Action - Initial Steps Towards Artificial Cognition,” in IEEE Int. Conf on Robotics and Automation, 2003, pp. 3140–3145.
  9. J.M. Ferryman, A.D. Worrall, and S.J. Maybank. Learning enhanced 3d models for vehicle tracking. In Proc. of the British Machine Vision Conference, 1998