M-tree explained

In computer science, M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries.While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this.^[1]

Overview

As in any tree-based data structure, the M-tree is composed of nodes and leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radius

that defines a Ball in the desired metric space. Thus, every node

and leaf

residing in a particular node

is at most distance

from

, and every node

and leaf

with node parent

keep the distance from it.

M-tree construction

Components

An M-tree has these components and sub-components:

Non-leaf nodes
1. A set of routing objects N_RO.
2. Pointer to Node's parent object O_p.
Leaf nodes
1. A set of objects N_O.
2. Pointer to Node's parent object O_p.
Routing Object
1. (Feature value of) routing object O_r.
2. Covering radius r(O_r).
3. Pointer to covering tree T(O_r).
4. Distance of O_r from its parent object d(O_r,P(O_r))
Object
1. (Feature value of the) object O_j.
2. Object identifier oid(O_j).
3. Distance of O_j from its parent object d(O_j,P(O_j))

Insert

The main idea is first to find a leaf node where the new object belongs. If is not full then just attach it to . If is full then invoke a method to split . The algorithm is as follows:

Input: Node of M-Tree, Output: A new instance of containing all entries in original

's routing objects or objects if is not a leaf then

Notes and References

Paolo . Ciaccia . Patella, Marco . Zezula, Pavel . M-tree An Efficient Access Method for Similarity Search in Metric Spaces . Proceedings of the 23rd VLDB Conference Athens, Greece, 1997 . 426–435 . Very Large Databases Endowment Inc. . 1997 . IBM Almaden Research Center . 2010-09-07 . p426.