4
4.0
Jan 13, 2021
01/21
Jan 13, 2021
by
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Li Fei-Fei
data
eye 4
favorite 0
comment 0
ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will...
Topic: imagenet
Source: https://www.image-net.org
13
13
2020
2020
2020
by
Chen, Huan-Zhi; Yang, Hong-Yi; Zhong, Kai; Li, Jia-Li
texts
eye 13
favorite 0
comment 0
4
4.0
Apr 12, 2017
04/17
Apr 12, 2017
by
Zhou Ren; Xiaoyu Wang; Ning Zhang; Xutao Lv; Li-Jia Li
texts
eye 4
favorite 0
comment 0
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize...
Topics: Artificial Intelligence, Computing Research Repository, Computer Vision and Pattern Recognition
Source: http://arxiv.org/abs/1704.03899
22
22
Mar 7, 2017
03/17
Mar 7, 2017
by
Yuncheng Li; Jianchao Yang; Yale Song; Liangliang Cao; Jiebo Luo; Li-Jia Li
texts
eye 22
favorite 0
comment 0
The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain. Traditionally, the label noises have been treated as statistical outliers, and approaches such as importance re-weighting and bootstrap have been proposed to alleviate the problem. According to our observation, the real-world noisy labels exhibit multi-mode characteristics as the true labels, rather than behaving like independent...
Topics: Learning, Machine Learning, Statistics, Computing Research Repository, Computer Vision and Pattern...
Source: http://arxiv.org/abs/1703.02391
22
22
2017
2017
2017
by
Wei, Shu; Hua, Hai-Rong; Chen, Qian-Quan; Zhang, Ying; Chen, Fei; Li, Shu-Qing; Li, Fan; Li, Jia-Li
texts
eye 22
favorite 0
comment 0
4
4.0
Feb 23, 2016
02/16
Feb 23, 2016
by
Ranjay Krishna; Yuke Zhu; Oliver Groth; Justin Johnson; Kenji Hata; Joshua Kravitz; Stephanie Chen; Yannis Kalantidis; Li-Jia Li; David A. Shamma; Michael S. Bernstein; Fei-Fei Li
texts
eye 4
favorite 0
comment 0
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the...
Topics: Computer Vision and Pattern Recognition, Artificial Intelligence, Computing Research Repository
Source: http://arxiv.org/abs/1602.07332
9
9.0
2016
2016
2016
by
Li, Jia-Li; Zheng, Yong-Tang; Zhao, Xu-Dong; Hu, Xin-Tian
texts
eye 9
favorite 0
comment 0
16
16
Mar 5, 2015
03/15
Mar 5, 2015
by
Bart Thomee; David A. Shamma; Gerald Friedland; Benjamin Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li
texts
eye 16
favorite 0
comment 0
We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection...
Topics: Computing Research Repository, Computers and Society, Multimedia
Source: http://arxiv.org/abs/1503.01817
34
34
Feb 9, 2015
02/15
Feb 9, 2015
by
Sachin Sudhakar Farfade; Mohammad Saberian; Li-Jia Li
texts
eye 34
favorite 0
comment 0
In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in all orientations, e.g. 22 models in HeadHunter method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method that does not require...
Topics: Computer Vision and Pattern Recognition, Computing Research Repository
Source: http://arxiv.org/abs/1502.02766
5
5.0
Nov 20, 2014
11/14
Nov 20, 2014
by
Can Xu; Suleyman Cetintas; Kuang-Chih Lee; Li-Jia Li
texts
eye 5
favorite 0
comment 0
Images have become one of the most popular types of media through which users convey their emotions within online social networks. Although vast amount of research is devoted to sentiment analysis of textual data, there has been very limited work that focuses on analyzing sentiment of image data. In this work, we propose a novel visual sentiment prediction framework that performs image understanding with Deep Convolutional Neural Networks (CNN). Specifically, the proposed sentiment prediction...
Topics: Neural and Evolutionary Computing, Machine Learning, Computing Research Repository, Computer Vision...
Source: http://arxiv.org/abs/1411.5731
5
5.0
Jul 6, 2014
07/14
Jul 6, 2014
by
Xiangnan Kong; Zhaoming Wu; Li-Jia Li; Ruofei Zhang; Philip S. Yu; Hang Wu; Wei Fan
texts
eye 5
favorite 0
comment 0
Multi-label learning deals with the classification problems where each instance can be assigned with multiple labels simultaneously. Conventional multi-label learning approaches mainly focus on exploiting label correlations. It is usually assumed, explicitly or implicitly, that the label sets for training instances are fully labeled without any missing labels. However, in many real-world multi-label datasets, the label assignments for training instances can be incomplete. Some ground-truth...
Topics: Computing Research Repository, Learning
Source: http://arxiv.org/abs/1407.1538
5,234
5.2K
-
-
-
by
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Li Fei-Fei
data
eye 5,234
favorite 1
comment 0
ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will...
Topics: imagenet, deep learning
Source: http://academictorrents.com/details/564a77c1e1119da199ff32622a1609431b9f1c47
6
6.0
-
-
-
by
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li Jia-Li, David Ayman Shamma, Michael Bernstein, Li Fei-Fei
data
eye 6
favorite 0
comment 0
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li Jia-Li, David Ayman Shamma, Michael Bernstein, Li Fei-Fei image meta data (16.92 MB) region descriptions (988.18 MB) question answers (201.09 MB) objects (99.14 MB) attributes (174.97 MB) relationships (406.70 MB)
Source: http://academictorrents.com/details/ca98efc75a80278b795ce056fd4229c1bc6f229f