Sign language recognition explained

Sign Language Recognition (shortened generally as SLR) is a computational task that involves recognizing actions from sign languages.^[1] This is an essential problem to solve especially in the digital world to bridge the communication gap that is faced by people with hearing impairments.

Solving the problem usually requires not only annotated color (RGB) data, but various other modalities like depth, sensory information, etc. are also useful.

Isolated sign language recognition

ISLR (also known as word-level SLR) is the task of recognizing individual signs or tokens called glosses from a given segment of signing video clip. This is commonly seen as a classification problem when recognizing from isolated videos, but requires other things like video segmentation to be handled when used for real-time applications.

Continuous sign language recognition

In CSLR (also known as sign language transcription), given a sign language sequence, the task is to predict all the signs (or glosses) in the video. This is more suitable for real-world transcription of sign languages. Depending on how it is solved, it can also sometimes be seen as an extension to the ISLR task.

Continuous sign language translation

Sign language translation refers to the problem of translating a sequence of signs (called glosses) to any required spoken language. It is generally modeled as an extension to the CSLR problem.

Notes and References

Book: Cooper . Helen . Holt . Brian . Bowden . Richard . Visual Analysis of Humans . Sign Language Recognition . Visual Analysis of Humans: Looking at People . 2011 . 539–562 . 10.1007/978-0-85729-997-0_27 . https://link.springer.com/chapter/10.1007/978-0-85729-997-0_27 . Springer . 978-0-85729-996-3 . 1297591 . en.