Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to many technologies, software and hardware products, integrated systems, actions, methods and expertise. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of computer science. It attempts to integrate existing technologies in new ways and apply them to solve real world problems. The term is the prevalent one for these functions in industrial automation environments but is also used for these functions in other environment vehicle guidance.
The overall machine vision process includes planning the details of the requirements and project, and then creating a solution. During run-time, the process starts with imaging, followed by automated analysis of the image and extraction of the required information.
Definitions of the term "Machine vision" vary, but all include the technology and methods used to extract information from an image on an automated basis, as opposed to image processing, where the output is another image. The information extracted can be a simple good-part/bad-part signal, or more a complex set of data such as the identity, position and orientation of each object in an image. The information can be used for such applications as automatic inspection and robot and process guidance in industry, for security monitoring and vehicle guidance.[1] [2] This field encompasses a large number of technologies, software and hardware products, integrated systems, actions, methods and expertise.[3] [4] Machine vision is practically the only term used for these functions in industrial automation applications; the term is less universal for these functions in other environments such as security and vehicle guidance. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of basic computer science; machine vision attempts to integrate existing technologies in new ways and apply them to solve real world problems in a way that meets the requirements of industrial automation and similar application areas.[5] The term is also used in a broader sense by trade shows and trade groups such as the Automated Imaging Association and the European Machine Vision Association. This broader definition also encompasses products and applications most often associated with image processing.[4] The primary uses for machine vision are automatic inspection and industrial robot/process guidance.[6] In more recent times the terms computer vision and machine vision have converged to a greater degree. [7] See glossary of machine vision.
The primary uses for machine vision are imaging-based automatic inspection and sorting and robot guidance.; in this section the former is abbreviated as "automatic inspection". The overall process includes planning the details of the requirements and project, and then creating a solution.[8] This section describes the technical process that occurs during the operation of the solution.
The first step in the automatic inspection sequence of operation is acquisition of an image, typically using cameras, lenses, and lighting that has been designed to provide the differentiation required by subsequent processing.[9] [10] MV software packages and programs developed in them then employ various digital image processing techniques to extract the required information, and often make decisions (such as pass/fail) based on the extracted information.[11]
The components of an automatic inspection system usually include lighting, a camera or other imager, a processor, software, and output devices.[12]
The imaging device (e.g. camera) can either be separate from the main image processing unit or combined with it in which case the combination is generally called a smart camera or smart sensor.[13] [14] Inclusion of the full processing function into the same enclosure as the camera is often referred to as embedded processing.[15] When separated, the connection may be made to specialized intermediate hardware, a custom processing appliance, or a frame grabber within a computer using either an analog or standardized digital interface (Camera Link, CoaXPress).[16] [17] [18] MV implementations also use digital cameras capable of direct connections (without a framegrabber) to a computer via FireWire, USB or Gigabit Ethernet interfaces.[18] [19]
While conventional (2D visible light) imaging is most commonly used in MV, alternatives include multispectral imaging, hyperspectral imaging, imaging various infrared bands,[20] line scan imaging, 3D imaging of surfaces and X-ray imaging.[21] Key differentiations within MV 2D visible light imaging are monochromatic vs. color, frame rate, resolution, and whether or not the imaging process is simultaneous over the entire image, making it suitable for moving processes.[22]
Though the vast majority of machine vision applications are solved using two-dimensional imaging, machine vision applications utilizing 3D imaging are a growing niche within the industry.[23] [24] The most commonly used method for 3D imaging is scanning based triangulation which utilizes motion of the product or image during the imaging process. A laser is projected onto the surfaces of an object. In machine vision this is accomplished with a scanning motion, either by moving the workpiece, or by moving the camera & laser imaging system. The line is viewed by a camera from a different angle; the deviation of the line represents shape variations. Lines from multiple scans are assembled into a depth map or point cloud. Stereoscopic vision is used in special cases involving unique features present in both views of a pair of cameras.[25] Other 3D methods used for machine vision are time of flight and grid based.[25] [23] One method is grid array based systems using pseudorandom structured light system as employed by the Microsoft Kinect system circa 2012.[26] [27]
After an image is acquired, it is processed.[28] Central processing functions are generally done by a CPU, a GPU, a FPGA or a combination of these.[15] Deep learning training and inference impose higher processing performance requirements.[29] Multiple stages of processing are generally used in a sequence that ends up as a desired result. A typical sequence might start with tools such as filters which modify the image, followed by extraction of objects, then extraction (e.g. measurements, reading of codes) of data from those objects, followed by communicating that data, or comparing it against target values to create and communicate "pass/fail" results. Machine vision image processing methods include;
A common output from automatic inspection systems is pass/fail decisions.[11] These decisions may in turn trigger mechanisms that reject failed items or sound an alarm. Other common outputs include object position and orientation information for robot guidance systems.[21] Additionally, output types include numerical measurement data, data read from codes and characters, counts and classification of objects, displays of the process or results, stored images, alarms from automated space monitoring MV systems, and process control signals.[41] [10] This also includes user interfaces, interfaces for the integration of multi-component systems and automated data interchange.[42]
The term "Deep Learning" has variable meanings, most of which can be applied to techniques used in machine vision for over 20 years. However the usage of the term in Machine Vision began in the later 2010s with the advent of the capability to successfully apply such techniques to entire images in the industrial machine vision space.[43] Conventional machine vision usually requires the "physics" phase of a machine vision automatic inspection solution to create reliable simple differentiation of defects. An example of "simple" differentiation is that the defects are dark and the good parts of the product are light. A common reason why some applications were not doable was when it was impossible to achieve the "simple"; deep learning removes this requirement, in essence "seeing" the object more as a human does, making it now possible to accomplish those automatic applications.[43] The system learns from a large amount of images during a training phase and then executes the inspection during run-time use which is called "inference".[43]
Machine vision commonly provides location and orientation information to a robot to allow the robot to properly grasp the product. This capability is also used to guide motion that is simpler than robots, such as a 1 or 2 axis motion controller.[21] The overall process includes planning the details of the requirements and project, and then creating a solution. This section describes the technical process that occurs during the operation of the solution. Many of the process steps are the same as with automatic inspection except with a focus on providing position and orientation information as the result.[21]
As recently as 2006, one industry consultant reported that MV represented a $1.5 billion market in North America.[44] However, the editor-in-chief of an MV trade magazine asserted that "machine vision is not an industry per se" but rather "the integration of technologies and products that provide services or applications that benefit true industries such as automotive or consumer goods manufacturing, agriculture, and defense."