Digital automated identification system (DAISY) | |
Developer: | Mark A. O'Neill |
Latest Release Version: | 2.1.0 |
Programming Language: | C |
Operating System: | Linux |
Platform: | IA-32 x86-64 ARM |
Language: | English |
License: | Proprietary commercial software |
Digital automated identification system (DAISY) is an automated species identification system optimised for the rapid screening of invertebrates (e.g. insects) by non-experts (e.g. parataxonomists).
It was developed by Dr. Mark O'Neill during the mid-1990s. Development was supported by funding from the Darwin Initiative in 1997[1] and BBSRC.[2] The intellectual property rights were acquired by O'Neill's company, Tumbling Dice Ltd, in February 2000[3] at the end of the grant funded Darwin Project. The system underwent further development resulting in an producing an exemplar which is web accessible and which can cope in near real time with groups (e.g. hawk moths) which contain several hundred taxa. On medium to high end PC server hardware (e.g. a blade server) an identification is possible in under a second for a 300 taxon group. Parallelisation of the critical DAISY classifier codes (using either bespoke FPGA technology or general purpose GPU programming technology such as CUDA) will give an order of magnitude increase in performance. This means that DAISY can be deployed to make real time identifications within groups containing thousands of taxa (e.g. true flies).
image type | structure | training images | species | success(%) | ||
---|---|---|---|---|---|---|
RGB | wing | 705 | 58 | 97 | ||
Xylophanes sp. | RGB | wing | 543 | 30 | 99 | |
mono | wing | 559 | 47 | 95 | ||
UK butterflies | RGB | wing | 818 | 57 | 98 | |
UK macro moths | RGB | wing | 744 | 37 | 98 | |
Caterpillars | RGB | head | 91 | 7 | 93 | |
Caterpillars | RGB | body | 508 | 10 | 99 | |
Soft fruit pests | RGB | body | 2634 | 23 | 91 |
image type | structure | training images | classes | success(%) | ||
---|---|---|---|---|---|---|
Food cans | RGB | label | 31 | 5 | 100 | |
Industrial objects | RGB | unposed object | 155 | 14 | 100 | |
Foraminifera tests | RGB | unposed object | 198 | 8 | 95 | |
Pollen grains | RGB | unposed object | 6601 | 12 | 99 | |
Spiders | mono | genitalia | 102 | 6 | 91 | |
Human faces | mono | unposed face | 400 | 41 | 99 |
DAISY has been used in several research projects by O'Neill[4] and others, and featured in popular science TV and magazine articles. The project has also been the subject of a recent article in Science.[5]
In 2011, the first DAISY installation capable of scaling to hundreds of taxa was installed at Natural History Museum in London. This server offered both VNC and web service based interfaces and was able to offload compute intensive pattern matching operations onto an NVIDIA GPU programmed using CUDA. This installation was capable of providing identification to species given a 300+ taxon dataset in less than a second in a multiple user environment.
More recently, under the aegis of Innovate UK funding, DAISY has been extensively modified to meet the needs of upstream activities within the oil and gas sector, in particular biostratigraphy. The resultant system, GeoDAISY represents a significant technological advance. It is capable of deep learning, knowledge encapsulation, pattern based data mining and (image based) content search and can efficiently handle training sets consisting of millions of patterns on commodity hardware using a combination of smart data caching and OpenMP. Further details of GeoDAISY, and the rationale for developing it are available as white papers on the Tumbling Dice LinkedIn page.