List of datasets in computer vision and image processing explained

This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Object detection and recognition

Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Ego 4DA massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video.Object bounding boxes, transcriptions, labeling.3,670 video hoursvideo, audio, transcriptionsMultimodal first-person task2022[1] K. Grauman et al.
Visual GenomeImages and their description108,000images, textImage captioning2016[2] R. Krishna et al.
Berkeley 3-D Object Dataset849 images taken in 75 different scenes. About 50 different object classes are labeled.Object bounding boxes and labeling.849labeled images, textObject recognition2014[3] [4] A. Janoch et al.
Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)500 natural images, explicitly separated into disjoint train, validation and test subsets + benchmarking code. Based on BSDS300.Each image segmented by five different subjects on average.500Segmented imagesContour detection and hierarchical image segmentation2011[5] University of California, Berkeley
Microsoft Common Objects in Context (COCO)complex everyday scenes of common objects in their natural context.Object highlighting, labeling, and classification into 91 object types.2,500,000Labeled images, textObject recognition2015[6] [7] [8] T. Lin et al.
SUN DatabaseVery large scene and object recognition database.Places and objects are labeled. Objects are segmented.131,067Images, textObject recognition, scene recognition2014[9] [10] J. Xiao et al.
ImageNetLabeled object image database, used in the ImageNet Large Scale Visual Recognition ChallengeLabeled objects, bounding boxes, descriptive words, SIFT features 14,197,122Images, textObject recognition, scene recognition2009 (2014)[11] [12] J. Deng et al.
Open ImagesA Large set of images listed as having CC BY 2.0 license with image-level labels and bounding boxes spanning thousands of classes.Image-level labels, Bounding boxes9,178,275Images, textClassification, Object recognition2017 (V7 : 2022)[13]
TV News Channel Commercial Detection DatasetTV commercials and news broadcasts.Audio and video features extracted from still images.129,685TextClustering, classification2015[14] [15] P. Guha et al.
Statlog (Image Segmentation) DatasetThe instances were drawn randomly from a database of 7 outdoor images and hand-segmented to create a classification for every pixel.Many features calculated.2310TextClassification1990[16] University of Massachusetts
Caltech 101Pictures of objects.Detailed object outlines marked.9146ImagesClassification, object recognition2003[17] [18] F. Li et al.
Caltech-256Large dataset of images for object classification.Images categorized and hand-sorted.30,607Images, TextClassification, object detection2007[19] [20] G. Griffin et al.
COYO-700MImage–text-pair dataset10 billion pairs of alt-text and image sources in HTML documents in CommonCrawl746,972,269Images, TextClassification, Image-Language2022
SIFT10M DatasetSIFT features of Caltech-256 dataset.Extensive SIFT feature extraction.11,164,866TextClassification, object detection2016[21] X. Fu et al.
LabelMeAnnotated pictures of scenes.Objects outlined.187,240Images, textClassification, object detection2005[22] MIT Computer Science and Artificial Intelligence Laboratory
PASCAL VOC DatasetLarge number of images for classification tasks.Labeling, bounding box included500,000Images, textClassification, object detection2010[23] [24] M. Everingham et al.
CIFAR-10 DatasetMany small, low-resolution, images of 10 classes of objects.Classes labelled, training set splits created.60,000ImagesClassification2009[25] [26] A. Krizhevsky et al.
CIFAR-100 DatasetLike CIFAR-10, above, but 100 classes of objects are given.Classes labelled, training set splits created.60,000ImagesClassification2009A. Krizhevsky et al.
CINIC-10 DatasetA unified contribution of CIFAR-10 and Imagenet with 10 classes, and 3 splits. Larger than CIFAR-10.Classes labelled, training, validation, test set splits created.270,000ImagesClassification2018[27] Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey
Fashion-MNISTA MNIST-like fashion product databaseClasses labelled, training set splits created.60,000ImagesClassification2017Zalando SE
notMNISTSome publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A–J taken from different fonts.Classes labelled, training set splits created.500,000ImagesClassification2011[28] Yaroslav Bulatov
Linnaeus 5 datasetImages of 5 classes of objects.Classes labelled, training set splits created.8000ImagesClassification2017[29] Chaladze & Kalatozishvili
11K Hands11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 – 75 years old, for gender recognition and biometric identification.None11,076 hand imagesImages and (.mat, .txt, and .csv) label filesGender recognition and biometric identification2017[30] M Afifi
CORe50Specifically designed for Continuous/Lifelong Learning and Object Recognition, is a collection of more than 500 videos (30fps) of 50 domestic objects belonging to 10 different categories.Classes labelled, training set splits created based on a 3-way, multi-runs benchmark.164,866 RBG-D images images (.png or .pkl)and (.pkl, .txt, .tsv) label filesClassification, Object recognition2017[31] V. Lomonaco and D. Maltoni
OpenLORIS-ObjectLifelong/Continual Robotic Vision dataset (OpenLORIS-Object) collected by real robots mounted with multiple high-resolution sensors, includes a collection of 121 object instances (1st version of dataset, 40 categories daily necessities objects under 20 scenes). The dataset has rigorously considered 4 environment factors under different scenes, including illumination, occlusion, object pixel size and clutter, and defines the difficulty levels of each factor explicitly. Classes labelled, training/validation/testing set splits created by benchmark scripts. 1,106,424 RBG-D images images (.png and .pkl)and (.pkl) label filesClassification, Lifelong object recognition, Robotic Vision2019[32] Q. She et al.
THz and thermal video data setThis multispectral data set includes terahertz, thermal, visual, near infrared, and three-dimensional videos of objects hidden under people's clothes.3D lookup tables are provided that allow you to project images onto 3D point clouds.More than 20 videos. The duration of each video is about 85 seconds (about 345 frames).AP2JExperiments with hidden object detection2019[33] [34] Alexei A. Morozov and Olga S. Sushkova

Object detection and recognition for autonomous vehicles

Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Cityscapes DatasetStereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.Pixel-level segmentation and labeling25,000Images, textClassification, object detection2016[35] Daimler AG et al.
German Traffic Sign Detection Benchmark DatasetImages from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.Signs manually labeled900ImagesClassification2013[36] [37] S. Houben et al.
KITTI Vision Benchmark DatasetAutonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.Many benchmarks extracted from data.>100 GB of dataImages, textClassification, object detection2012[38] [39] A. Geiger et al.
FieldSAFEMulti-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.Classes labelled geographically.>400 GB of dataImages and 3D point cloudsClassification, object detection, object localization2017[40] M. Kragh et al.
Daimler Monocular Pedestrian Detection datasetIt is a dataset of pedestrians in urban environments. Pedestrians are box-wise labeled.Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels.ImagesObject recognition and classification2006[41] [42] [43] Daimler AG
CamVidThe Cambridge-driving Labeled Video Database (CamVid) is a collection of videos.The dataset is labeled with semantic labels for 32 semantic classes.over 700 imagesImagesObject recognition and classification2008[44] [45] [46] Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla
RailSem19RailSem19 is a dataset for understanding scenes for vision systems on railways.The dataset is labeled semanticly and box-wise.8500ImagesObject recognition and classification, scene recognition2019[47] [48] Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai
BOREASBOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime (128-beam) lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.The data is annotated by 3D bounding boxes.350 km of driving dataImages, Lidar and Radar dataObject recognition and classification, scene recognition2023[49] [50] Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot
Bosch Small Traffic Lights DatasetIt is a dataset of traffic lights.The labeling include bounding boxes of traffic lights together with their state (active light). 5000 images for training and a video sequence of 8334 frames for evaluationImagesTraffic light recognition2017[51] [52] Karsten Behrendt, Libor Novak, Rami Botros
FRSignIt is a dataset of French railway signals.The labeling include bounding boxes of railway signals together with their state (active light). more than 100000ImagesRailway signal recognition2020[53] [54] Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri
GERALDIt is a dataset of German railway signals.The labeling include bounding boxes of railway signals together with their state (active light). 5000ImagesRailway signal recognition2023[55] [56] Philipp Leibner, Fabian Hampel, Christian Schindler
Multi-cue pedestrianMulti-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians. The databaset is labeled box-wise.1092 image pairs with 1776 boxes for pedestriansImagesObject recognition and classification2009[57] Christian Wojek, Stefan Walk, Bernt Schiele
RAWPEDRAWPED is a dataset for detection of pedestrians in the context of railways. The dataset is labeled box-wise.26000ImagesObject recognition and classification2020[58] [59] Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver
OSDaR23OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.The databaset is labeled box-wise.16874 framesImages, Lidar, Radar and InfraredObject recognition and classification2023[60] [61] DZSF, Digitale Schiene Deutschland, and FusionSystems
AgroverseArgoverse is a multi-sensory dataset for detection of objects in the context of roads.The dataset is annotated box-wise.320 hours of recordingData from 7 cameras and LiDARObject recognition and classification, object tracking2022[62] [63] Argo AI, Carnegie Mellon University, Georgia Institute of Technology

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces.

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault taskCreated (updated)ReferenceCreator
Aff-Wild298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect (valence-arousal); in-the-wild setting; color database; various resolutions (average = 640x360)the detected faces, facial landmarks and valence-arousal annotations~1,250,000 manually annotated imagesvideo (visual + audio modalities)affect recognition (valence-arousal estimation)2017CVPR[64] IJCV[65] D. Kollias et al.
Aff-Wild2558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect (7 basic expressions: neutral, happiness, sadness, surprise, fear, disgust, anger); ii) dimensional affect (valence-arousal); iii) action units (AUs 1,2,4,6,12,15,20,25); in-the-wild setting; color database; various resolutions (average = 1030x630)the detected faces, detected and aligned faces and annotations~2,800,000 manually annotated imagesvideo (visual + audio modalities)affect recognition (valence-arousal estimation, basic expression classification, action unit detection)2019BMVC[66] FG[67] D. Kollias et al.
FERET (facial recognition technology)11338 images of 1199 individuals in different positions and at different times.None.11,338ImagesClassification, face recognition2003[68] [69] United States Department of Defense
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.Files labelled with expression. Perceptual validation ratings provided by 319 raters.7,356Video, sound filesClassification, face recognition, voice recognition2018[70] [71] S.R. Livingstone and F.A. Russo
SCFaceColor images of faces at various angles.Location of facial features extracted. Coordinates of features given.4,160Images, textClassification, face recognition2011[72] [73] M. Grgic et al.
Yale Face DatabaseFaces of 15 individuals in 11 different expressions.Labels of expressions.165ImagesFace recognition1997[74] [75] J. Yang et al.
Cohn-Kanade AU-Coded Expression DatabaseLarge database of images with labels for expressions.Tracking of certain facial features.500+ sequencesImages, textFacial expression analysis2000[76] [77] T. Kanade et al.
JAFFE Facial Expression Database213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models.Images are cropped to the facial region. Includes semantic ratings data on emotion labels.213Images, textFacial expression cognition1998[78] [79] Lyons, Kamachi, Gyoba
FaceScrubImages of public figures scrubbed from image searching.Name and m/f annotation.107,818Images, textFace recognition2014[80] [81] H. Ng et al.
BioID Face DatabaseImages of faces with eye positions marked.Manually set eye positions.1521Images, textFace recognition2001[82] [83] BioID
Skin Segmentation DatasetRandomly sampled color values from face images.B, G, R, values extracted.245,057TextSegmentation, classification2012[84] [85] R. Bhatt.
Bosphorus3D Face image database.34 action units and 6 expressions labeled; 24 facial landmarks labeled.4652Images, textFace recognition, classification2008[86] [87] A Savran et al.
UOY 3D-Faceneutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.labeling.5250Images, textFace recognition, classification2004[88] [89] University of York
CASIA 3D Face DatabaseExpressions: Anger, smile, laugh, surprise, closed eyes.None.4624Images, textFace recognition, classification2007[90] [91] Institute of Automation, Chinese Academy of Sciences
CASIA NIRExpressions: Anger Disgust Fear Happiness Sadness SurpriseNone.480Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per secondFace recognition, classification2011[92] Zhao, G. et al.
BU-3DFEneutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted.None.2500Images, textFacial expression recognition, classification2006[93] Binghamton University
Face Recognition Grand Challenge DatasetUp to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.None.4007Images, textFace recognition, classification2004[94] [95] National Institute of Standards and Technology
GavabdbUp to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.None.549Images, textFace recognition, classification2008[96] [97] King Juan Carlos University
3D-RMAUp to 100 subjects, expressions mostly neutral. Several poses as well.None.9971Images, textFace recognition, classification2004[98] [99] Royal Military Academy (Belgium)
SoF112 persons (66 males and 46 females) wear glasses under different illumination conditions.A set of synthetic filters (blur, occlusions, noise, and posterization) with different level of difficulty.42,592 (2,662 original image × 16 synthetic image)Images, Mat fileGender classification, face detection, face recognition, age estimation, and glasses detection2017[100] [101] Afifi, M. et al.
IMDb-WIKIIMDb and Wikipedia face images with gender and age labels.None523,051ImagesGender classification, face detection, face recognition, age estimation2015[102] R. Rothe, R. Timofte, L. V. Gool

Action recognition

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
TV Human Interaction DatasetVideos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none.None.6,766 video clipsvideo clipsAction prediction2013[103] Patron-Perez, A. et al.
Berkeley Multimodal Human Action Database (MHAD)Recordings of a single person performing 12 actionsMoCap pre-processing660 action samples8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphonesAction classification2013[104] Ofli, F. et al.
THUMOS DatasetLarge video dataset for action classification.Actions classified and labeled.45M frames of videoVideo, images, textClassification, action detection2013[105] [106] Y. Jiang et al.
MEXAction2Video dataset for action localization and spotting Actions classified and labeled.1000VideoAction detection2014[107] Stoian et al.

Handwriting and character recognition

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Artificial Characters DatasetArtificially generated data describing the structure of 10 capital English letters.Coordinates of lines drawn given as integers. Various other features.6000TextHandwriting recognition, classification1992[108] H. Guvenir et al.
Letter DatasetUpper-case printed letters.17 features are extracted from all images.20,000TextOCR, classification1991[109] [110] D. Slate et al.
CASIA-HWDBOffline handwritten Chinese character database. 3755 classes in the GB 2312 character set.Gray-scaled images with background pixels labeled as 255. 1,172,907Images, TextHandwriting recognition, classification2009[111] CASIA
CASIA-OLHWDBOnline handwritten Chinese character database, collected using Anoto pen on paper. 3755 classes in the GB 2312 character set.Provides the sequences of coordinates of strokes.1,174,364Images, TextHandwriting recognition, classification2009[112] CASIA
Character Trajectories DatasetLabeled samples of pen tip trajectories for people writing simple characters.3-dimensional pen tip velocity trajectory matrix for each sample2858TextHandwriting recognition, classification2008[113] [114] B. Williams
Chars74K DatasetCharacter recognition in natural images of symbols used in both English and Kannada74,107Character recognition, handwriting recognition, OCR, classification2009[115] T. de Campos
EMNIST datasetHandwritten characters from 3600 contributorsDerived from NIST Special Database 19. Converted to 28x28 pixel images, matching the MNIST dataset.[116] 800,000Imagescharacter recognition, classification, handwriting recognition2016EMNIST dataset[117]

Documentation[118]

Gregory Cohen, et al.
UJI Pen Characters DatasetIsolated handwritten charactersCoordinates of pen position as characters were written given.11,640TextHandwriting recognition, classification2009[119] [120] F. Prat et al.
Gisette DatasetHandwriting samples from the often-confused 4 and 9 characters.Features extracted from images, split into train/test, handwriting images size-normalized.13,500Images, textHandwriting recognition, classification2003[121] Yann LeCun et al.
Omniglot dataset1623 different handwritten characters from 50 different alphabets.Hand-labeled.38,300Images, text, strokesClassification, one-shot learning2015[122] American Association for the Advancement of Science
MNIST databaseDatabase of handwritten digits.Hand-labeled.60,000Images, textClassification1994[123] [124] National Institute of Standards and Technology
Optical Recognition of Handwritten Digits DatasetNormalized bitmaps of handwritten data.Size normalized and mapped to bitmaps.5620Images, textHandwriting recognition, classification1998[125] E. Alpaydin et al.
Pen-Based Recognition of Handwritten Digits DatasetHandwritten digits on electronic pen-tablet.Feature vectors extracted to be uniformly spaced.10,992Images, textHandwriting recognition, classification1998[126] [127] E. Alpaydin et al.
Semeion Handwritten Digit DatasetHandwritten digits from 80 people.All handwritten digits have been normalized for size and mapped to the same grid.1593Images, textHandwriting recognition, classification2008[128] T. Srl
HASYv2Handwritten mathematical symbolsAll symbols are centered and of size 32px x 32px.168233Images, textClassification2017[129] Martin Thoma
Noisy Handwritten Bangla DatasetIncludes Handwritten Numeral Dataset (10 classes) and Basic Character Dataset (50 classes), each dataset has three types of noise: white gaussian, motion blur, and reduced contrast.All images are centered and of size 32x32.Numeral Dataset:

23330,

Character Dataset:

76000

Images,textHandwriting recognition,classification2017[130] M. Karki et al.

Aerial images

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
iSAID: Instance Segmentation in Aerial Images DatasetPrecise instance-level annotatio carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.655,451 (15 classes)Images, jpg, jsonAerial Classification, Object Detection, Instance Segmentation2019[131] [132] Syed Waqas Zamir,Aditya Arora,

Akshita Gupta,

Salman Khan,

Guolei Sun,

Fahad Shahbaz Khan, Fan Zhu,

Ling Shao, Gui-Song Xia, Xiang Bai

Aerial Image Segmentation Dataset80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0.Images manually segmented.80ImagesAerial Classification, object detection2013[133] [134] J. Yuan et al.
KIT AIS Data SetMultiple labeled training and evaluation datasets of aerial images of crowds.Images manually labeled to show paths of individuals through crowds.~ 150Images with pathsPeople tracking, aerial tracking2012[135] [136] M. Butenuth et al.
Wilt DatasetRemote sensing data of diseased trees and other land cover.Various features extracted.4899ImagesClassification, aerial object detection2014[137] [138] B. Johnson
MASATI datasetMaritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions. Object bounding boxes and labeling. 7389ImagesClassification, aerial object detection2018[139] [140] A.-J. Gallego et al.
Forest Type Mapping DatasetSatellite imagery of forests in Japan.Image wavelength bands extracted.326TextClassification2015[141] [142] B. Johnson
Overhead Imagery Research Data SetAnnotated overhead imagery. Images with multiple objects.Over 30 annotations and over 60 statistics that describe the target within the context of the image.1000Images, textClassification2009[143] [144] F. Tanner et al.
SpaceNetSpaceNet is a corpus of commercial satellite imagery and labeled training data.GeoTiff and GeoJSON files containing building footprints.>17533ImagesClassification, Object Identification2017[145] [146] [147] DigitalGlobe, Inc.
UC Merced Land Use DatasetThese images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the US.This is a 21 class land use image dataset meant for research purposes. There are 100 images for each class.2,100Image chips of 256x256, 30 cm (1 foot) GSDLand cover classification2010[148] Yi Yang and Shawn Newsam
SAT-4 Airborne DatasetImages were extracted from the National Agriculture Imagery Program (NAIP) dataset.SAT-4 has four broad land cover classes, includes barren land, trees, grassland and a class that consists of all land cover classes other than the above three. 500,000ImagesClassification2015[149] [150] S. Basu et al.
SAT-6 Airborne DatasetImages were extracted from the National Agriculture Imagery Program (NAIP) dataset.SAT-6 has six broad land cover classes, includes barren land, trees, grassland, roads, buildings and water bodies.405,000ImagesClassification2015S. Basu et al.

Underwater images

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
SUIM DatasetThe images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants.Images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor.1,635ImagesSegmentation2020[151] Md Jahidul Islam et al.
LIACI DatasetImages have been collected during underwater ship inspections and annotated by human domain experts.Images with pixel annotations for ten object categories: defects, corrosion, paint peel, marine growth, sea chest gratings, overboard valves, propeller, anodes, bilge keel and ship hull.1,893ImagesSegmentation2022[152] Waszak et al.

Other images

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
NRC-GAMMAA novel benchmark gas meter image datasetNone28,883Image, LabelClassification2021[153] [154] A. Ebadi, P. Paul, S. Auer, & S. Tremblay
The SUPATLANTIQUE datasetImages of scanned official and Wikipedia documentsNone4908TIFF/pdfSource device identification, forgery detection, Classification,..2020[155] C. Ben Rabah et al.
Density functional theory quantum simulations of grapheneLabelled images of raw input to a simulation of grapheneRaw data (in HDF5 format) and output labels from density functional theory quantum simulation60744 test and 501473 training filesLabeled imagesRegression2019K. Mills & I. Tamblyn
Quantum simulations of an electron in a two dimensional potential wellLabelled images of raw input to a simulation of 2d Quantum mechanicsRaw data (in HDF5 format) and output labels from quantum simulation1.3 million imagesLabeled imagesRegression2017[156] K. Mills, M.A. Spanner, & I. Tamblyn
MPII Cooking Activities DatasetVideos and images of various cooking activities.Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling.881,755 framesLabeled video, images, textClassification2012[157] [158] M. Rohrbach et al.
FAMOS Dataset5,000 unique microstructures, all samples have been acquired 3 times with two different cameras.Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition.30,000Images and .mat files Authentication2012[159] S. Voloshynovskiy, et al.
PharmaPack Dataset1,000 unique classes with 54 images per class.Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV).54,000Images and .mat files Fine-grain classification2017[160] O. Taran and S. Rezaeifar, et al.
Stanford Dogs DatasetImages of 120 breeds of dogs from around the world.Train/test splits and ImageNet annotations provided.20,580Images, textFine-grain classification2011[161] [162] A. Khosla et al.
StanfordExtra Dataset2D keypoints and segmentations for the Stanford Dogs Dataset.2D keypoints and segmentations provided.12,035Labelled images3D reconstruction/pose estimation2020[163] B. Biggs et al.
The Oxford-IIIT Pet Dataset37 categories of pets with roughly 200 images of each.Breed labeled, tight bounding box, foreground-background segmentation.~ 7,400Images, textClassification, object detection2012[164] O. Parkhi et al.
Corel Image Features Data SetDatabase of images with features extracted.Many features including color histogram, co-occurrence texture, and colormoments,68,040TextClassification, object detection1999[165] [166] M. Ortega-Bindenberger et al.
Online Video Characteristics and Transcoding Time Dataset.Transcoding times for various different videos and video properties.Video features given.168,286TextRegression2015[167] T. Deneke et al.
Microsoft Sequential Image Narrative Dataset (SIND)Dataset for sequential vision-to-languageDescriptive caption and storytelling given for each photo, and photos are arranged in sequences81,743Images, textVisual storytelling2016[168] Microsoft Research
Caltech-UCSD Birds-200-2011 DatasetLarge dataset of images of birds.Part locations for birds, bounding boxes, 312 binary attributes given11,788Images, textClassification2011[169] [170] C. Wah et al.
YouTube-8MLarge and diverse labeled video datasetYouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities8 millionVideo, textVideo classification2016[171] [172] S. Abu-El-Haija et al.
YFCC100MLarge and diverse labeled image and video datasetFlickr Videos and Images and associated description, titles, tags, and other metadata (such as EXIF and geotags)100 millionVideo, Image, TextVideo and Image classification2016[173] [174] B. Thomee et al.
Discrete LIRIS-ACCEDEShort videos annotated for valence and arousal.Valence and arousal labels.9800VideoVideo emotion elicitation detection2015[175] Y. Baveye et al.
Continuous LIRIS-ACCEDELong videos annotated for valence and arousal while also collecting Galvanic Skin Response.Valence and arousal labels.30VideoVideo emotion elicitation detection2015[176] Y. Baveye et al.
MediaEval LIRIS-ACCEDEExtension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films.Violence, valence and arousal labels.10900VideoVideo emotion elicitation detection2015[177] Y. Baveye et al.
Leeds Sports PoseArticulated human pose annotations in 2000 natural sports images from Flickr.Rough crop around single person of interest with 14 joint labels2000Images plus .mat file labelsHuman pose estimation2010[178] S. Johnson and M. Everingham
Leeds Sports Pose Extended TrainingArticulated human pose annotations in 10,000 natural sports images from Flickr.14 joint labels via crowdsourcing10000Images plus .mat file labelsHuman pose estimation2011[179] S. Johnson and M. Everingham
MCQ Dataset6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems.None735 answer sheets and 33,540 answer boxesImages and .mat file labelsDevelopment of multiple choice test assessment systems2017[180] [181] Afifi, M. et al.
Surveillance VideosReal surveillance videos cover a large surveillance time (7 days with 24 hours each).None19 surveillance videos (7 days with 24 hours each).VideosData compression2016[182] Taj-Eddin, I. A. T. F. et al.
LILA BCLabeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science.None~10M imagesImagesClassification2019[183] LILA working group
Can We See Photosynthesis?32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions. None32 videosVideosLiveness detection of plants2017[184] Taj-Eddin, I. A. T. F. et al.
Mathematical Mathematics MemesCollection of 10,000 memes on mathematics.None~10,000ImagesVisual storytelling, object detection.2021[185] Mathematical Mathematics Memes
Flickr-Faces-HQ DatasetCollection of images containing a face each, crawled from FlickrPruned with "various automatic filters", cropped and aligned to faces, and had images of statues, paintings, or photos of photos removed via crowdsourcing70,000ImagesFace Generation2019[186] Karras et al.
Fruits-360 datasetDatabase with images of 131 fruits, vegetables and nuts.100x100 pixels, white background.90483Images (jpg)Classification2017–2024[187] Mihai Oltean

References

  1. Grauman . Kristen . Westbury . Andrew . Byrne . Eugene . Chavis . Zachary . Furnari . Antonino . Girdhar . Rohit . Hamburger . Jackson . Jiang . Hao . Liu . Miao . Liu . Xingyu . Martin . Miguel . Nagarajan . Tushar . Radosavovic . Ilija . Ramakrishnan . Santhosh Kumar . Ryan . Fiona . Sharma . Jayant . Wray . Michael . Xu . Mengmeng . Xu . Eric Zhongcong . Zhao . Chen . Bansal . Siddhant . Batra . Dhruv . Cartillier . Vincent . Crane . Sean . Do . Tien . Doulaty . Morrie . Erapalli . Akshay . Feichtenhofer . Christoph . Fragomeni . Adriano . Fu . Qichen . Gebreselasie . Abrham . Gonzalez . Cristina . Hillis . James . Huang . Xuhua . Huang . Yifei . Jia . Wenqi . Khoo . Weslie . Kolar . Jachym . Kottur . Satwik . Kumar . Anurag . Landini . Federico . Li . Chao . Li . Yanghao . Li . Zhenqiang . Mangalam . Karttikeya . Modhugu . Raghava . Munro . Jonathan . Murrell . Tullie . Nishiyasu . Takumi . Price . Will . Puentes . Paola Ruiz . Ramazanova . Merey . Sari . Leda . Somasundaram . Kiran . Southerland . Audrey . Sugano . Yusuke . Tao . Ruijie . Vo . Minh . Wang . Yuchen . Wu . Xindi . Yagi . Takuma . Zhao . Ziwei . Zhu . Yunyi . Arbelaez . Pablo . Crandall . David . Damen . Dima . Farinella . Giovanni Maria . Fuegen . Christian . Ghanem . Bernard . Ithapu . Vamsi Krishna . Jawahar . C. V. . Joo . Hanbyul . Kitani . Kris . Li . Haizhou . Newcombe . Richard . Oliva . Aude . Park . Hyun Soo . Rehg . James M. . Sato . Yoichi . Shi . Jianbo . Shou . Mike Zheng . Torralba . Antonio . Torresani . Lorenzo . Yan . Mingfei . Malik . Jitendra . Ego4D: Around the World in 3,000 Hours of Egocentric Video . 2022 . cs.CV . 2110.07058.
  2. 10.1007/s11263-016-0981-7. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision. 123. 32–73. 2017. Krishna. Ranjay. Zhu. Yuke. Groth. Oliver. Johnson. Justin. Hata. Kenji. Kravitz. Joshua. Chen. Stephanie. Kalantidis. Yannis. Li. Li-Jia. Shamma. David A. Bernstein. Michael S. Fei-Fei. Li. 1602.07332. 4492210.
  3. Karayev, S., et al. "A category-level 3-D object dataset: putting the Kinect to work." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2011.
  4. Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable nonparametric image parsing with superpixels ." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 352–365.
  5. Arbelaez. P.. Maire. M. Fowlkes. C. Malik. J. Contour Detection and Hierarchical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence . May 2011. 33. 5. 898–916. 27 February 2016. 10.1109/tpami.2010.161. 20733228. 206764694.
  6. 1405.0312 . Lin . Tsung-Yi . Maire . Michael . Belongie . Serge . Bourdev . Lubomir . Girshick . Ross . Hays . James . Perona . Pietro . Ramanan . Deva . Lawrence Zitnick . C. . Dollár . Piotr . Microsoft COCO: Common Objects in Context . 2014 . cs.CV .
  7. Russakovsky . Olga . et al . 2015 . Imagenet large scale visual recognition challenge . International Journal of Computer Vision . 115 . 3. 211–252 . 10.1007/s11263-015-0816-y. 1409.0575 . 1721.1/104944 . 2930547 .
  8. Web site: COCO – Common Objects in Context. cocodataset.org.
  9. Xiao, Jianxiong, et al. "Sun database: Large-scale scene recognition from abbey to zoo." Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, 2010.
  10. 1310.1531. Donahue. Jeff. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition . Jia. Yangqing. Vinyals. Oriol. Hoffman. Judy. Zhang. Ning. Tzeng. Eric. Darrell. Trevor. cs.CV. 2013.
  11. Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database."Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
  12. Russakovsky. Olga. Deng. Jia. Su. Hao. Krause. Jonathan. Satheesh. Sanjeev. Ma. Sean. Huang. Zhiheng. Karpathy. Andrej. Khosla. Aditya. Bernstein. Michael. Berg. Alexander C.. Fei-Fei. Li. 5. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. 11 April 2015. 115. 3. 211–252. 10.1007/s11263-015-0816-y. 1409.0575. 1721.1/104944. 2930547.
  13. Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages."
  14. Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast News Videos." Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing. ACM, 2014.
  15. Hauptmann, Alexander G., and Michael J. Witbrock. "Story segmentation and detection of commercials in broadcast news video." Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on. IEEE, 1998.
  16. Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and visualizing nonlinear correlation clusters." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005.
  17. Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.
  18. [Svetlana Lazebnik|Lazebnik, Svetlana]
  19. Griffin, G., A. Holub, and P. Perona. Caltech-256 object category dataset California Inst. Technol., Tech. Rep. 7694, 2007. Available: http://authors.library.caltech.edu/7694, 2007.
  20. Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.
  21. Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing." Computer Vision—ACCV 2014. Springer International Publishing, 2014. 162–177.
  22. Heitz . Geremy . et al . 2009 . Shape-based object localization for descriptive classification . International Journal of Computer Vision . 84 . 1. 40–62 . 10.1007/s11263-009-0228-y. 10.1.1.142.280 . 646320 .
  23. Everingham . Mark . et al . 2010 . The pascal visual object classes (voc) challenge . International Journal of Computer Vision . 88 . 2. 303–338 . 10.1007/s11263-009-0275-4. 20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6 . 4246903 . free .
  24. Felzenszwalb . Pedro F. . et al . 2010 . Object detection with discriminatively trained part-based models . IEEE Transactions on Pattern Analysis and Machine Intelligence . 32 . 9. 1627–1645 . 10.1109/tpami.2009.167. 20634557 . 10.1.1.153.2745 . 3198903 .
  25. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
  26. Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
  27. Web site: CINIC-10 dataset. Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 2018-11-13. 2018-10-09.
  28. Web site: notMNIST dataset. Machine Learning, etc. 2017-10-13. 2011-09-08.
  29. Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5 datasetChaladze.com. Retrieved 13 November 2017, from http://chaladze.com/l5/
  30. Afifi. Mahmoud. 2017-11-12. Gender recognition and biometric identification using a large dataset of hand images. 1711.04322. cs.CV.
  31. Lomonaco. Vincenzo. Maltoni. Davide. 2017-10-18. CORe50: a New Dataset and Benchmark for Continuous Object Recognition. 1705.03550. cs.CV.
  32. She. Qi. Feng. Fan. Hao. Xinyue. Yang. Qihan. Lan. Chuanlin. Lomonaco. Vincenzo. Shi. Xuesong. Wang. Zhengwei. Guo. Yao. Zhang. Yimin. Qiao. Fei. Chan. Rosa H.M.. 2019-11-15. OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning. 1911.06487v2. cs.CV.
  33. Web site: THz and thermal video data set. Morozov. Alexei. Sushkova. Olga. 2019-06-13. IRE RAS. Development of the multi-agent logic programming approach to a human behaviour analysis in a multi-channel video surveillance. 2019-07-19. Moscow.
  34. Morozov. Alexei. Sushkova. Olga. Kershner. Ivan. Polupanov. Alexander. 2019-07-09. Development of a method of terahertz intelligent video surveillance based on the semantic fusion of terahertz and 3D video images. CEUR. 2391. paper19. 2019-07-19.
  35. M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset." In CVPR Workshop on The Future of Datasets in Vision, 2015.
  36. Houben, Sebastian, et al. "Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.
  37. Mathias, Mayeul, et al. "Traffic sign recognition—How far are we from the solution?." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.
  38. Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
  39. Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D SLAM systems." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012.
  40. Kragh . Mikkel F. . et al . 2017 . FieldSAFE – Dataset for Obstacle Detection in Agriculture . Sensors . 17 . 11 . 2579. 10.3390/s17112579 . 29120383 . 5713196 . 2017Senso..17.2579K. 1709.03526 . free .
  41. Web site: Papers with Code - Daimler Monocular Pedestrian Detection Dataset . paperswithcode.com . 5 May 2023 . en.
  42. Enzweiler . Markus . Gavrila . Dariu M. . Monocular Pedestrian Detection: Survey and Experiments . IEEE Transactions on Pattern Analysis and Machine Intelligence . December 2009 . 31 . 12 . 2179–2195 . 10.1109/TPAMI.2008.260 . 19834140 . 1192198 . 1939-3539.
  43. Yin . Guojun . Liu . Bin . Zhu . Huihui . Gong . Tao . Yu . Nenghai . A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis . 28 July 2020 . cs.CV . 1904.11784 .
  44. Web site: Object Recognition in Video Dataset . mi.eng.cam.ac.uk . 5 May 2023.
  45. Book: Brostow . Gabriel J. . Shotton . Jamie . Fauqueur . Julien . Cipolla . Roberto . Computer Vision – ECCV 2008 . Segmentation and Recognition Using Structure from Motion Point Clouds . Lecture Notes in Computer Science . 2008 . 5302 . 44–57 . 10.1007/978-3-540-88682-2_5 . https://link.springer.com/chapter/10.1007/978-3-540-88682-2_5 . Springer . 978-3-540-88681-5 . en.
  46. Brostow . Gabriel J. . Fauqueur . Julien . Cipolla . Roberto . Semantic object classes in video: A high-definition ground truth database . Pattern Recognition Letters . 15 January 2009 . 30 . 2 . 88–97 . 10.1016/j.patrec.2008.04.005 . 2009PaReL..30...88B . en . 0167-8655.
  47. Web site: WildDash 2 Benchmark . wilddash.cc . 5 May 2023.
  48. Book: Zendel . Oliver . Murschitz . Markus . Zeilinger . Marcel . Steininger . Daniel . Abbasi . Sara . Beleznai . Csaba . 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . RailSem19: A Dataset for Semantic Rail Scene Understanding . June 2019 . 1221–1229 . 10.1109/CVPRW.2019.00161 . 978-1-7281-2506-0 . 198166233 . https://ieeexplore.ieee.org/document/9025646.
  49. Web site: The Boreas Dataset . www.boreas.utias.utoronto.ca . 5 May 2023.
  50. Burnett . Keenan . Yoon . David J. . Wu . Yuchen . Li . Andrew Zou . Zhang . Haowei . Lu . Shichen . Qian . Jingxing . Tseng . Wei-Kang . Lambert . Andrew . Leung . Keith Y. K. . Schoellig . Angela P.. Angela Schoellig . Barfoot . Timothy D. . Boreas: A Multi-Season Autonomous Driving Dataset . 26 January 2023 . cs.RO . 2203.10168 .
  51. Web site: Bosch Small Traffic Lights Dataset . hci.iwr.uni-heidelberg.de . 5 May 2023 . en . 1 March 2017.
  52. Book: Behrendt . Karsten . Novak . Libor . Botros . Rami . 2017 IEEE International Conference on Robotics and Automation (ICRA) . A deep learning approach to traffic lights: Detection, tracking, and classification . May 2017 . 1370–1377 . 10.1109/ICRA.2017.7989163 . 978-1-5090-4633-1 . 6257133 . https://ieeexplore.ieee.org/document/7989163.
  53. Web site: FRSign Dataset . frsign.irt-systemx.fr . 5 May 2023.
  54. Harb . Jeanine . Rébéna . Nicolas . Chosidow . Raphaël . Roblin . Grégoire . Potarusov . Roman . Hajri . Hatem . FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains . 5 February 2020 . cs.CY . 2002.05665 .
  55. Web site: ifs-rwth-aachen/GERALD . Chair and Institute for Rail Vehicles and Transport Systems . 5 May 2023 . 30 April 2023.
  56. Leibner . Philipp . Hampel . Fabian . Schindler . Christian . GERALD: A novel dataset for the detection of German mainline railway signals . Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit . 3 April 2023 . 237 . 10 . 1332–1342 . 10.1177/09544097231166472 . 257939937 . en . 0954-4097.
  57. Book: Wojek . Christian . Walk . Stefan . Schiele . Bernt . 2009 IEEE Conference on Computer Vision and Pattern Recognition . Multi-cue onboard pedestrian detection . June 2009 . 794–801 . 10.1109/CVPR.2009.5206638 . 978-1-4244-3992-8 . 18000078 . https://ieeexplore.ieee.org/document/5206638.
  58. Toprak . Tuğçe . Aydın . Burak . Belenlioğlu . Burak . Güzeliş . Cüneyt . Selver . M. Alper . Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems. IEEE Transactions on Vehicular Technology . 5 May 2023 . 5 April 2020. 1 . 10.1109/TVT.2020.2983825 . 216510283 .
  59. Toprak . Tugce . Belenlioglu . Burak . Aydın . Burak . Guzelis . Cuneyt . Selver . M. Alper . Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems . IEEE Transactions on Vehicular Technology . May 2020 . 69 . 5 . 5041–5054 . 10.1109/TVT.2020.2983825 . 216510283 . 1939-9359.
  60. Tilly . Roman . Neumaier . Philipp . Schwalbe . Karsten . Klasek . Pavel . Tagiew . Rustam . Denzler . Patrick . Klockau . Tobias . Boekhoff . Martin . Köppel . Martin . Open Sensor Data for Rail 2023 . 2023 . 10.57806/9mv146r0 . de.
  61. Book: Tagiew . Rustam . Köppel . Martin . Schwalbe . Karsten . Denzler . Patrick . Neumaier . Philipp . Klockau . Tobias . Boekhoff . Martin . Klasek . Pavel . Tilly . Roman . 2023 8th International Conference on Robotics and Automation Engineering (ICRAE) . OSDaR23: Open Sensor Data for Rail 2023 . 4 May 2023 . 270–276 . 10.1109/ICRAE59816.2023.10458449 . 2305.03001 . 979-8-3503-2765-6 .
  62. Web site: Home . Argoverse . 5 May 2023.
  63. Chang . Ming-Fang . Lambert . John . Sangkloy . Patsorn . Singh . Jagjeet . Bak . Slawomir . Hartnett . Andrew . Wang . De . Carr . Peter . Lucey . Simon . Ramanan . Deva . Hays . James . Argoverse: 3D Tracking and Forecasting with Rich Maps . 6 November 2019 . cs.CV . 1911.02620 .
  64. Book: Zafeiriou. S.. Kollias. D.. Nicolaou. M.A.. Papaioannou. A.. Zhao. G.. Kotsia. I.. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Aff-Wild: Valence and Arousal 'In-the-Wild' Challenge . 2017. https://eprints.mdx.ac.uk/22045/1/aff_wild_kotsia.pdf. 1980–1987. 10.1109/CVPRW.2017.248. 978-1-5386-0733-6. 3107614.
  65. Kollias. D.. Tzirakis. P.. Nicolaou. M.A.. Papaioannou. A.. Zhao. G.. Schuller. B.. Kotsia. I.. Zafeiriou. S.. 2019. Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond. International Journal of Computer Vision . 127. 6–7. 907–929. 10.1007/s11263-019-01158-4. 13679040. free. 1804.10938.
  66. Kollias. D.. Zafeiriou. S.. 2019. Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. British Machine Vision Conference (BMVC), 2019. 1910.04855.
  67. Book: Kollias. D.. Schulc. A.. Hajiyev. E.. Zafeiriou. S.. 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) . Analysing Affective Behavior in the First ABAW 2020 Competition . 2020. https://www.computer.org/csdl/proceedings-article/fg/2020/307900a794/1kecIYu9wL6. 637–643. 10.1109/FG47880.2020.00126. 2001.11409. 978-1-7281-3079-8. 210966051.
  68. Phillips . P. Jonathon . et al . 1998 . The FERET database and evaluation procedure for face-recognition algorithms . Image and Vision Computing . 16 . 5. 295–306 . 10.1016/s0262-8856(97)00070-x.
  69. Wiskott . Laurenz . et al . 1997 . Face recognition by elastic bunch graph matching . IEEE Transactions on Pattern Analysis and Machine Intelligence . 19 . 7. 775–779 . 10.1109/34.598235. 10.1.1.44.2321 . 30523165 .
  70. 10.1371/journal.pone.0196391. 29768426. 5955500. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE. 13. 5. e0196391. 2018. Livingstone. Steven R.. Russo. Frank A.. 2018PLoSO..1396391L. free.
  71. Book: 10.5281/zenodo.1188976. 2018. Livingstone. Steven R.. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Russo. Frank A.. Emotion.
  72. Grgic . Mislav . Delac . Kresimir . Grgic . Sonja . 2011 . SCface–surveillance cameras face database . Multimedia Tools and Applications . 51 . 3. 863–879 . 10.1007/s11042-009-0417-2 . 207218990 .
  73. Wallace, Roy, et al. "Inter-session variability modelling and joint factor analysis for face authentication." Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011.
  74. Georghiades . A . Yale face database . Center for Computational Vision and Control at Yale University. 2 . 1997 .
  75. Nguyen . Duy . et al . 2006 . Real-time face detection and lip feature extraction using field-programmable gate arrays . IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 36 . 4. 902–912 . 10.1109/tsmcb.2005.862728. 16903373 . 10.1.1.156.9848 . 7334355 .
  76. [Takeo Kanade|Kanade, Takeo]
  77. Zeng . Zhihong . et al . 2009 . A survey of affect recognition methods: Audio, visual, and spontaneous expressions . IEEE Transactions on Pattern Analysis and Machine Intelligence . 31 . 1. 39–58 . 10.1109/tpami.2008.52. 19029545 . 10.1.1.144.217 .
  78. Book: 10.5281/zenodo.3451524. 1998. Lyons. Michael. The Japanese Female Facial Expression (JAFFE) Database. Kamachi. Miyuki. Gyoba. Jiro. Facial expression images.
  79. Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro "Coding facial expressions with Gabor wavelets." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998.
  80. Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to cleaning large face datasets ." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014.
  81. 1506.01342. RoyChowdhury. Aruni. One-to-many face recognition with bilinear CNNs. Lin. Tsung-Yu. Maji. Subhransu. Learned-Miller. Erik. cs.CV. 2015.
  82. Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz. "Robust face detection using the hausdorff distance." Audio-and video-based biometric person authentication. Springer Berlin Heidelberg, 2001.
  83. Huang, Gary B., et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007.
  84. Bhatt, Rajen B., et al. "Efficient skin region segmentation using low complexity fuzzy decision tree model." India Conference (INDICON), 2009 Annual IEEE. IEEE, 2009.
  85. Lingala . Mounika . et al . 2014 . Fuzzy logic color detection: Blue areas in melanoma dermoscopy images . Computerized Medical Imaging and Graphics . 38 . 5. 403–410 . 10.1016/j.compmedimag.2014.03.007. 24786720 . 4287461 .
  86. Maes, Chris, et al. "Feature detection on 3D face surfaces for pose normalisation and recognition." Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010.
  87. Savran, Arman, et al. "Bosphorus database for 3D face analysis." Biometrics and Identity Management. Springer Berlin Heidelberg, 2008. 47–56.
  88. Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-dimensional face recognition: An eigensurface approach." Image Processing, 2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE, 2004.
  89. Ge . Yun . et al . 2011 . 3D Novel Face Sample Modeling for Face Recognition . Journal of Multimedia . 6 . 5. 467–475 . 10.4304/jmm.6.5.467-475. 10.1.1.461.9710 .
  90. Wang . Yueming . Liu . Jianzhuang . Tang . Xiaoou . 2010 . Robust 3D face recognition by local shape difference boosting . IEEE Transactions on Pattern Analysis and Machine Intelligence . 32 . 10. 1858–1870 . 10.1109/tpami.2009.200. 20724762 . 10.1.1.471.2424 . 15263913 .
  91. Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face recognition using learned visual codebook." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.
  92. Zhao . G. . Huang . X. . Taini . M. . Li . S. Z. . Pietikäinen . M. . 2011 . Facial expression recognition from near-infrared videos . Image and Vision Computing . 29 . 9. 607–619 . 10.1016/j.imavis.2011.07.002 .
  93. Soyel, Hamit, and Hasan Demirel. "Facial expression recognition using 3D facial feature distances." Image Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–838.
  94. Bowyer . Kevin W. . Chang . Kyong . Flynn . Patrick . 2006 . A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition . Computer Vision and Image Understanding . 101 . 1. 1–15 . 10.1016/j.cviu.2005.05.005. 10.1.1.134.8784 .
  95. Tan . Xiaoyang . Triggs . Bill . 2010 . Enhanced local texture feature sets for face recognition under difficult lighting conditions . IEEE Transactions on Image Processing. 19 . 6. 1635–1650 . 10.1109/tip.2010.2042645. 20172829 . 2010ITIP...19.1635T . 10.1.1.105.3355 . 4943234 .
  96. Book: https://ieeexplore.ieee.org/document/4529822 . 10.1109/ICIS.2008.77 . Three Dimensional Face Recognition Using SVM Classifier . Seventh IEEE/ACIS International Conference on Computer and Information Science (Icis 2008) . 2008 . Mousavi . Mir Hashem . Faez . Karim . Asghari . Amin . 208–213 . 978-0-7695-3131-1 . 2710422 .
  97. Book: 2008 . 978-1-4244-2154-1 . 10.1109/AFGR.2008.4813376 . https://gravis.dmi.unibas.ch/publications/2008/FG08_Amberg.pdf . dead . Expression invariant 3D face recognition with a Morphable Model . 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition . Amberg . Brian . Knothe . Reinhard . Vetter . Thomas . 1–6 . 5651453 . 6 August 2019 . 28 July 2018 . https://web.archive.org/web/20180728233944/http://gravis.dmi.unibas.ch/publications/2008/FG08_Amberg.pdf .
  98. Book: https://www.researchgate.net/publication/4090704 . 10.1109/ICPR.2004.1333734. 3D shape-based face recognition using automatically registered facial surfaces. Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. 2004. Irfanoglu. M.O.. Gokberk. B.. Akarun. L.. 183–186 Vol.4. 0-7695-2128-2. 10987293.
  99. Beumier . Charles . Acheroy . Marc . 2001 . Face verification from 3D and grey level clues . Pattern Recognition Letters . 22 . 12. 1321–1329 . 10.1016/s0167-8655(01)00077-0. 2001PaReL..22.1321B .
  100. Afifi. Mahmoud. Abdelhamed. Abdelrahman. 2017-06-13. AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces. 1706.04277. cs.CV.
  101. Web site: SoF dataset. sites.google.com. en-US. 2017-11-18.
  102. Web site: IMDb-WIKI. data.vision.ee.ethz.ch. en-US. 2018-03-13.
  103. Patron-Perez . A. . Marszalek . M. . Reid . I. . Zisserman . A. . 2012 . Structured learning of human interactions in TV shows . IEEE Transactions on Pattern Analysis and Machine Intelligence . 34 . 12. 2441–2453 . 10.1109/tpami.2012.24. 23079467 . 6060568 .
  104. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January 2013). Berkeley MHAD: A comprehensive multimodal human action database. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE.
  105. Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013.
  106. Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014.
  107. 10.1109/TCSVT.2015.2475835. Fast Action Localization in Large-Scale Video Archives. IEEE Transactions on Circuits and Systems for Video Technology. 26. 10. 1917–1930. 2016. Stoian. Andrei. Ferecatu. Marin. Benois-Pineau. Jenny. Crucianu. Michel. 31537462.
  108. Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept definitions." Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993.
  109. Frey . Peter W. . Slate . David J. . 1991 . Letter recognition using Holland-style adaptive classifiers . Machine Learning . 6 . 2. 161–182 . 10.1007/bf00114162. free .
  110. Peltonen . Jaakko . Klami . Arto . Kaski . Samuel . 2004 . Improved learning of Riemannian metrics for exploratory analysis . Neural Networks . 17 . 8. 1087–1100 . 10.1016/j.neunet.2004.06.008. 15555853 . 10.1.1.59.4865 .
  111. Online and offline handwritten Chinese character recognition: Benchmarking on new databases . Pattern Recognition . 46 . 1 . January 2013 . 155–162 . Cheng-Lin . Liu . Fei . Yin . Da-Han . Wang . Qiu-Feng . Wang . 10.1016/j.patcog.2012.06.021 . 2013PatRe..46..155L .
  112. Book: Wang . D. . C. . Liu . J. . Yu . X. . Zhou . 2009 10th International Conference on Document Analysis and Recognition . CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters . 2009 . 1206–1210. 10.1109/ICDAR.2009.163 . 978-1-4244-4500-4 . 5705532 .
  113. Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting motion primitives from natural handwriting data. Springer Berlin Heidelberg, 2006.
  114. Meier, Franziska, et al. "Movement segmentation using a primitive library."Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011.
  115. T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009
  116. 1702.05373v1. Cohen. Gregory. Afshar. Saeed. Tapson. Jonathan. André van Schaik. EMNIST: An extension of MNIST to handwritten letters. 2017. cs.CV .
  117. The EMNIST Dataset. NIST . 4 April 2017.
  118. 1702.05373 . Cohen . Gregory . Afshar . Saeed . Tapson . Jonathan . André van Schaik . EMNIST: An extension of MNIST to handwritten letters . 2017 . cs.CV .
  119. Llorens, David, et al. "The UJIpenchars Database: a Pen-Based Database of Isolated Handwritten Characters." LREC. 2008.
  120. Calderara . Simone . Prati . Andrea . Cucchiara . Rita . 2011 . Mixtures of von mises distributions for people trajectory shape analysis . IEEE Transactions on Circuits and Systems for Video Technology. 21 . 4. 457–471 . 10.1109/tcsvt.2011.2125550. 1427766 .
  121. Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature selection challenge." Advances in neural information processing systems. 2004.
  122. Lake. B. M.. Salakhutdinov. R.. Tenenbaum. J. B.. 2015-12-11. Human-level concept learning through probabilistic program induction. Science. en. 350. 6266. 1332–1338. 10.1126/science.aab3050. 0036-8075. 26659050. 2015Sci...350.1332L. free.
  123. LeCun . Yann . et al . 1998 . Gradient-based learning applied to document recognition . Proceedings of the IEEE . 86 . 11. 2278–2324 . 10.1109/5.726791. 10.1.1.32.9552 . 14542261 .
  124. Kussul . Ernst . Baidyk . Tatiana . Tetyana Baydyk. 2004 . Improved method of handwritten digit recognition tested on MNIST database . Image and Vision Computing . 22 . 12. 971–981 . 10.1016/j.imavis.2004.03.008 .
  125. Xu . Lei . Krzyżak . Adam . Suen . Ching Y. . 1992 . Methods of combining multiple classifiers and their applications to handwriting recognition . IEEE Transactions on Systems, Man, and Cybernetics. 22 . 3. 418–435 . 10.1109/21.155943. 10338.dmlcz/135217 .
  126. Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based handwritten digit recognition." (1996).
  127. Tang . E. Ke . et al . 2005 . Linear dimensionality reduction using relevance weighted LDA . Pattern Recognition . 38 . 4. 485–493 . 10.1016/j.patcog.2004.09.005. 2005PatRe..38..485T . 10580110 .
  128. Hong, Yi, et al. "Learning a mixture of sparse distance metrics for classification and dimensionality reduction." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
  129. 1701.08380. Thoma. Martin. The HASYv2 dataset. cs.CV. 2017.
  130. Karki. Manohar. Liu. Qun. DiBiano. Robert. Basu. Saikat. Mukhopadhyay. Supratik. 2018-06-20. Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters. 1806.08037. cs.CV.
  131. Web site: iSAID. 2021-11-30. captain-whu.github.io.
  132. Zamir, Syed & Arora, Aditya & Gupta, Akshita & Khan, Salman & Sun, Guolei & Khan, Fahad & Zhu, Fan & Shao, Ling & Xia, Gui-Song & Bai, Xiang. (2019). iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. website
  133. Yuan . Jiangye . Gleason . Shaun S. . Cheriyadat . Anil M. . 2013 . Systematic benchmarking of aerial image segmentation . IEEE Geoscience and Remote Sensing Letters. 10 . 6. 1527–1531 . 10.1109/lgrs.2013.2261453. 2013IGRSL..10.1527Y . 629629 .
  134. Vatsavai, Ranga Raju. "Object based image classification: state of the art and computational challenges." Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 2013.
  135. Butenuth, Matthias, et al. "Integrating pedestrian simulation, tracking and event detection for crowd analysis." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.
  136. Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis using frame-wise normalized feature for people counting." Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, 2012.
  137. Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan. "A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees." International journal of remote sensing34.20 (2013): 6969–6982.
  138. A new classification model for a class imbalanced data set using genetic programming and support vector machines: Case study for wilt disease classification. 10.1080/2150704X.2015.1062159. 2015. Mohd Pozi. Muhammad Syafiq. Sulaiman. Md Nasir. Mustapha. Norwati. Perumal. Thinagaran. Remote Sensing Letters. 6. 7. 568–577. 2015RSL.....6..568M . 58788630.
  139. Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks." Remote Sensing. 2018; 10(4):511.
  140. Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery dataset". Available: https://www.iuii.ua.es/datasets/masati/, 2018.
  141. Johnson . Brian . Tateishi . Ryutaro . Xie . Zhixiao . 2012 . Using geographically weighted variables for image classification . Remote Sensing Letters . 3 . 6. 491–499 . 10.1080/01431161.2011.629637. 2012RSL.....3..491J . 122543681 .
  142. Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid NN-GA Model Based Approach." Information Systems Design and Intelligent Applications. Springer India, 2016. 227–236.
  143. Diegert, Carl. "A combinatorial method for tracing objects using semantics of their shape." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 IEEE 39th. IEEE, 2010.
  144. Razakarivony, Sebastien, and Frédéric Jurie. "Small target detection combining foreground and background manifolds." IAPR International Conference on Machine Vision Applications. 2013.
  145. Web site: SpaceNet. explore.digitalglobe.com. 2018-03-13. 13 March 2018. https://web.archive.org/web/20180313092809/http://explore.digitalglobe.com/spacenet. dead.
  146. Web site: Getting Started With SpaceNet Data. Etten. Adam Van. 2017-01-05. The DownLinQ. 2018-03-13.
  147. Book: Vakalopoulou. M.. Bus. N.. Karantzalosa. K.. Paragios. N.. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) . Integrating edge/Boundary priors with classification scores for building detection in very high resolution data . July 2017. 3309–3312. 10.1109/IGARSS.2017.8127705. 978-1-5090-4951-6. 8297433.
  148. Book: Yang. Yi. Newsam. Shawn. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems . Bag-of-visual-words and spatial extensions for land-use classification . 2010. 270–279 . New York, New York, USA. ACM Press. 10.1145/1869790.1869829. 9781450304283. 993769.
  149. Book: Basu. Saikat. Ganguly. Sangram. Mukhopadhyay. Supratik. DiBiano. Robert. Karki. Manohar. Nemani. Ramakrishna. Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems . DeepSat: A learning framework for satellite imagery . 2015-11-03. ACM. 1–10. 10.1145/2820783.2820816. 9781450339674. 4387134.
  150. Liu. Qun. Basu. Saikat. Ganguly. Sangram. Mukhopadhyay. Supratik. DiBiano. Robert. Karki. Manohar. Nemani. Ramakrishna. 2019-11-21. DeepSat V2: feature augmented convolutional neural nets for satellite image classification. Remote Sensing Letters. 11. 2. 156–165. 10.1080/2150704x.2019.1693071. 1911.07747. 208138097. 2150-704X.
  151. Md Jahidul Islam, et al. "Semantic Segmentation of Underwater Imagery: Dataset and Benchmark." 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020.
  152. Waszak et al. "Semantic Segmentation in Underwater Ship Inspections: Benchmark and Data Set." IEEE Journal of Oceanic Engineering. IEEE, 2022.
  153. Ebadi. Ashkan. Paul. Patrick. Auer. Sofia. Tremblay. Stéphane. 2021-11-12. NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset. cs.CV. 2111.06827.
  154. Canada. Government of Canada National Research Council. The gas meter image dataset (NRC-GAMMA) - NRC Digital Repository. 2021-12-02. nrc-digital-repository.canada.ca. 2021. 10.4224/3c8s-z290.
  155. Book: Rabah. Chaima Ben. Coatrieux. Gouenou. Abdelfattah. Riadh. 2020 IEEE International Conference on Image Processing (ICIP) . The Supatlantique Scanned Documents Database for Digital Image Forensics Purposes . October 2020. http://dx.doi.org/10.1109/icip40778.2020.9190665. 2096–2100. IEEE. 10.1109/icip40778.2020.9190665. 978-1-7281-6395-6. 224881147.
  156. Book: 10.4224/PhysRevA.96.042113.data. Quantum simulations of an electron in a two dimensional potential well. 2018-05-16. Mills. Kyle. Spanner. Michael. Tamblyn. Isaac. Quantum simulation. National Research Council of Canada.
  157. Rohrbach . M. . Amin . S. . Andriluka . M. . Schiele . B. . 2012 IEEE Conference on Computer Vision and Pattern Recognition . A database for fine grained activity detection of cooking activities . IEEE . 2012 . 1194–1201 . 978-1-4673-1228-8 . 10.1109/cvpr.2012.6247801 .
  158. Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of actions: Recovering the syntax and semantics of goal-directed human activities."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
  159. Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in authentication based on physical non-cloneable functions: The Forensic Authentication Microstructure Optical Set (FAMOS)."Proc. Proceedings of IEEE International Workshop on Information Forensics and Security. 2012.
  160. Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile fine-grained recognition of pharma packages."Proc. European Signal Processing Conference (EUSIPCO). 2017.
  161. Khosla, Aditya, et al. "Novel dataset for fine-grained image categorization: Stanford dogs."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011.
  162. Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
  163. Book: 2007.11110 . 10.1007/978-3-030-58621-8 . Computer Vision – ECCV 2020 . Lecture Notes in Computer Science . 2020 . 12356 . 978-3-030-58620-1 . Biggs . Benjamin . Boyne . Oliver . Charles . James . Fitzgibbon . Andrew . Cipolla . Roberto . 227173931 .
  164. Razavian, Ali, et al. "CNN features off-the-shelf: an astounding baseline for recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014.
  165. Ortega . Michael . et al . 1998 . Supporting ranked boolean similarity queries in MARS . IEEE Transactions on Knowledge and Data Engineering. 10 . 6. 905–925 . 10.1109/69.738357. 10.1.1.36.6079 .
  166. He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "[ftp://www-vhost.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr04.pdf Multiscale conditional random fields for image labeling]." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004.
  167. Deneke, Tewodros, et al. "Video transcoding time prediction for proactive load balancing." Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 2014.
  168. Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell . 1604.03968 . Visual Storytelling . cs.CL . 13 April 2016 .
  169. Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset." (2011).
  170. Duan, Kun, et al. "Discovering localized attributes for fine-grained recognition." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
  171. Web site: YouTube-8M Dataset. research.google.com. 1 October 2016.
  172. Abu-El-Haija, Sami . Kothari, Nisarg . Lee, Joonseok . Natsev, Paul . Toderici, George . Varadarajan, Balakrishnan . Vijayanarasimhan, Sudheendra . 1609.08675 . YouTube-8M: A Large-Scale Video Classification Benchmark . cs.CV . 27 September 2016 .
  173. Web site: YFCC100M Dataset. mmcommons.org. Yahoo-ICSI-LLNL. 1 June 2017.
  174. Bart Thomee . David A Shamma . Gerald Friedland . Benjamin Elizalde . Karl Ni . Douglas Poland . Damian Borth . Li-Jia Li . 1503.01817 . Yfcc100m: The new data in multimedia research . 25 April 2016 . 10.1145/2812802 . 59 . 2 . Communications of the ACM . 64–73 . 207230134 .
  175. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS-ACCEDE: A Video Database for Affective Content Analysis," in IEEE Transactions on Affective Computing, 2015.
  176. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos," in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.
  177. M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The mediaeval 2015 affective impact of movies task," in MediaEval 2015 Workshop, 2015.
  178. S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation ", in Proceedings of the 21st British Machine Vision Conference (BMVC2010)
  179. S. Johnson and M. Everingham, "Learning Effective Human Pose Estimation from Inaccurate Annotation ", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2011)
  180. Afifi. Mahmoud. Hussain. Khaled F.. 2017-11-02. The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques. 1711.00972. cs.CV.
  181. Web site: MCQ Dataset. sites.google.com. en-US. 2017-11-18.
  182. Book: Taj-Eddin. I. A. T. F.. Afifi. M.. Korashy. M.. Hamdy. D.. Nasser. M.. Derbaz. S.. 2016 Sixth International Conference on Digital Information and Communication Technology and its Applications (DICTAP) . A new compression technique for surveillance videos: Evaluation using new dataset . July 2016. 159–164. 10.1109/DICTAP.2016.7544020. 978-1-4673-9609-7. 8698850.
  183. Tabak. Michael A.. Norouzzadeh. Mohammad S.. Wolfson. David W.. Sweeney. Steven J.. Vercauteren. Kurt C.. Snow. Nathan P.. Halseth. Joseph M.. Di Salvo. Paul A.. Lewis. Jesse S.. White. Michael D.. Teton. Ben. Beasley. James C.. Schlichting. Peter E.. Boughton. Raoul K.. Wight. Bethany. Newkirk. Eric S.. Ivan. Jacob S.. Odell. Eric A.. Brook. Ryan K.. Lukacs. Paul M.. Moeller. Anna K.. Mandeville. Elizabeth G.. Clune. Jeff. Miller. Ryan S.. Photopoulou. Theoni. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods in Ecology and Evolution. 10. 4. 585–590. 2018. 2041-210X. 10.1111/2041-210X.13120. free.
  184. Taj-Eddin. Islam A. T. F.. Afifi. Mahmoud. Korashy. Mostafa. Ahmed. Ali H.. Ng. Yoke Cheng. Hernandez. Evelyng. Abdel-Latif. Salma M.. November 2017. Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification. Journal of Electronic Imaging. 26. 6. 060501. 10.1117/1.jei.26.6.060501. 1017-9909. 1706.03867. 2017JEI....26f0501T. 12367169.
  185. Web site: Mathematical Mathematics Memes .
  186. Book: Karras . Tero . Laine . Samuli . Aila . Timo . 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . A Style-Based Generator Architecture for Generative Adversarial Networks . June 2019 . http://dx.doi.org/10.1109/cvpr.2019.00453 . 4396–4405 . IEEE . 10.1109/cvpr.2019.00453. 1812.04948 . 978-1-7281-3293-8 . 54482423 .
  187. Web site: Oltean. Mihai . 2017 . Fruits-360 dataset. .