"We want something more than theory and preaching now."

- Sherlock Holmes, A Study in Scarlet

Jie Feng 冯捷

Ph.D Candidate
Digital Video | Multimedia Lab
Department of Computer Science
Columbia University

jie feng email

jie feng photo

About Me

Columbia University Logo

2012.9 -

Ph.D Candidate

Columbia University

Google Research Logo

Starting 2016.6

Software Engineer Intern

Google Research

Adobe Logo

2015.6 - 2015.9

Computer Vision Intern

Adobe Research

A9.com Logo

2013.6 - 2013.8

Software Engineer Intern

Amazon A9.com

Microsoft Research Logo

2010.7 - 2011.3

Research Intern

Microsoft Research

I conduct research on Computer Vision and Machine Learning, with emphasis on:
Vision: object discovery, 2D/3D object retrieval, 3D object recognition
Learning: large-scale retrieval, deep learning
My advisor is Prof. Shih-Fu Chang.

My passion lies in building intelligent perceptual machines to make us better decision makers.



RGBD Object Segmentation

rgbd based object selection showcase
Object Selection with cue interpretation from RGBD images
Jie Feng, Brian Price, Scott Cohen, Shih-Fu Chang.
Interactive Segmentation on RGBD Images via Cue Selection.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. (Spotlight)
PDF Video Bibtex
Patent filed

We propose a novel interactive segmentation algorithm which can incorporate multiple feature cues like color, depth, and normals in an unified graph cut framework. It automatically selects a single cue to be used at each pixel, based on the intuition that only one cue is necessary to determine the segmentation label locally, thus produces not only the segmentation mask but also a cue label map that indicates how each cue contributes to the final result. Our algorithm performs significantly better than both other color-based and RGBD based algorithms in reducing the amount of user inputs as well as increasing segmentation accuracy.

Cross-Domain 3D Shape Retrieval

cross domain shape retrieval
Depth Image to 3D CAD Models
Jie Feng, Yan Wang, Shih-Fu Chang.
3D Shape Retrieval using a Single Depth Image from Low-cost Sensors.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2016.
PDF Bibtex Video WACV Talk

Content-based 3D shape retrieval is an important problem in computer vision. Traditional retrieval interfaces require a 2D sketch or a manually designed 3D model as the query, which is difficult to specify and thus not practical in real applications. With the recent advance in low-cost 3D sensors such as Microsoft Kinect and Intel Realsense, capturing depth images that carry 3D information is fairly simple, making shape retrieval more practical and user-friendly. In this paper, we study the problem of cross-domain 3D shape retrieval using a single depth image from low-cost sensors as the query to search for similar human designed CAD models. We propose a novel method using an ensemble of autoencoders in which each autoencoder is trained to learn a compressed representation of depth views synthesized from each database object. By viewing each autoencoder as a probabilistic model, a likelihood score can be derived as a similarity measure. A domain adaptation layer is built on top of autoencoder outputs to explicitly address the cross-domain issue (between noisy sensory data and clean 3D models) by incorporating training data of sensor depth images and their category labels in a weakly supervised learning formulation. Experiments using real-world depth images and a large-scale CAD dataset demonstrate the effectiveness of our approach, which offers significant improvements over state-of-the-art 3D shape retrieval methods.

Yan Wang, Jie Feng, Zhixiang Wu, Jun Wang, Shih-Fu Chang.
From Low-Cost Depth Sensors to CAD: Cross-Domain 3D Shape Retrieval via Regression Tree Fields.
European Conference on Computer Vision (ECCV), 2014.
PDF Bibtex Video

The recent advances of low-cost and mobile depth sensors dramatically extend the potential of 3D shape retrieval and analysis. While the traditional research of 3D retrieval mainly focused on searching by a rough 2D sketch or with a high-quality CAD model, we tackle a novel and challenging problem of cross-domain 3D shape retrieval, in which users can use 3D scans from low-cost depth sensors like Kinect as queries to search CAD models in the database. To cope with the imperfection of user-captured models such as model noise and occlusion, we propose a cross-domain shape retrieval framework, which minimizes the potential function of a Conditional Random Field to efficiently generate the retrieval scores. In particular, the potential function consists of two critical components: one unary potential term provides robust cross-domain partial matching and the other pairwise potential term embeds spatial structures to alleviate the instability from model noise. Both potential components are efficiently estimated using random forests with 3D local features, forming a Regression Tree Field framework. We conduct extensive experiments on two recently released user-captured 3D shape datasets and compare with several state-of-the-art approaches on the cross-domain shape retrieval task. The experimental results demonstrate that our proposed method outperforms the competing methods with a significant performance gain.

Salient Object Detection

salient object detection by composition
Salient Object Detection by Composition
Jie Feng, Yichen Wei, Litian Tao, Chao Zhang, Jian Sun.
Salient Object Detection by Composition.
IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 2011.
PDF Bibtex Executable Patent

Conventional saliency analysis methods measure the saliency of individual pixels. The resulting saliency map inevitably loses information in the original image and finding salient objects in it is difficult. We propose to detect salient objects by directly measuring the saliency of an image window in the original image and adopt the well established sliding window based object detection paradigm. We present a simple definition for window saliency, e.g. the cost of composing the window using the remaining parts of the image. The definition uses the entire image as the context and agrees with human intuition. It no longer relies on idealistic assumptions usually used before (e.g. ``background is homogenous") and generalizes well to complex objects and backgrounds in real world images. To realize the definition, we illustrate how to incorporate different cues such as appearance, position, and size. Based on a segment-based representation, the window composition cost function can be efficiently evaluated by a greedy optimization algorithm. Extensive evaluation on challenging object detection datasets verifies better efficacy and efficiency of the proposed method comparing to the state-of-the-art, making it a good pre-processing tool for subsequent applications. Moreover, we hope to stimulate further work towards the challenging yet important problem of generic salient object detection.

Peng Wang, Jingdong Wang, Gang Zeng, Jie Feng, Hongbin Zha, Shipeng Li.
Salient Object Detection for Searched Web Images via Global Saliency.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2012.
PDF Bibtex

In this paper, we deal with the problem of detecting the existence and the location of salient objects for thumbnail images on which most search engines usually perform visual analysis in order to handle web-scale images. Different from previous techniques, such as sliding windowbased or segmentation-based schemes for detecting salient objects, we propose to use a learning approach, random forest in our solution. Our algorithm exploits global features from multiple saliency information to directly predict the existence and the position of the salient object. To validate our algorithm, we constructed a large image database collected from Bing image search, that contains hundreds of thousands of manually labeled web images. The experimental results using this new database and the resized MSRA database demonstrate that our algorithm outperforms previous state-of-the-art methods.

Anomaly Detection in Videos

anomaly detection in crowd videos
Online Learning for Anomaly Detection
Jie Feng, Chao Zhang, Pengwei Hao.
Online Learning with Self-Organizing Maps for Anomaly Detection in Crowd Scenes.
IEEE International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010.
PDF Bibtex

Detecting abnormal behaviors in crowd scenes is quite important for public security and has been paid more and more attentions. Most previous methods use offline trained model to perform detection which can’t handle the constantly changing crowd environment. In this paper, we propose a novel unsupervised algorithm to detect abnormal behavior patterns in crowd scenes with online learning. The crowd behavior pattern is extracted from the local spatio-temporal volume which consists of multiple motion patterns in temporal order. An online self-organizing map (SOM) is used to model the large number of behavior patterns in crowd. Each neuron can be updated by incrementally learning the new observations. To demonstrate the effectiveness of our proposed method, we have performed experiments on real-world crowd scenes. The online learning can efficiently reduce the false alarms while still be able to detect most of the anomalies.

Jie Feng, Chao Zhang, Pengwei Hao.
Online Anomaly Detection by Clustering Dynamic Exemplars.
IEEE International Conference on Image Processing (ICIP), Orlando, Florida, 2012.
PDF Bibtex Video

We propose a non-parametric hierarchical event model to perform online anomaly detection in videos. A dynamic exemplar set is first used to represent observed event samples which updates itself every time when a new sample comes in. Upon this set, clusters are extracted to summarize the exemplars, offering a compact yet informative data structure for past event samples. Abnormal events are detected by both considering their dissimilarity with the model and low frequency. Experiments on real world crowd surveillance videos demonstrate the effectiveness and robustness of the proposed algorithm which shows reliable detection rates and low false alarms.

Side Projects

EyeStyle cover image


A fashion discovery service that turns any image into similar looking fashion products. Anything you see is shoppable!

Find more details here.

Fitronix cover image


A fitness app that helps user improve performance and avoid injuries. We use 3D sensor to analyze user's body movements, compare them against a trainer, and provide real-time feedback to ensure proper form.

Unreal AI Programming Packt

Book on Game AI Programming

  • Understand the fundamental components of Game AI within Unreal Engine 4
  • Creat, debug and analyze Game AI behavior
  • Design responsive Game AI using the Behavior Tree methodology
  • Create smart objects designed to interact with AI
  • Many more