Dynamic Path Prediction
A Semi-Lazy Learning System to Probabilistic Path Prediction in Dynamic Environments
Path prediction has a board range of application, including navigation, traffic management, personal positioning, actionable advertising. However, in dynamic environments (such as urban space), since the movement of objects is affected by many factors, path prediction is a difficult and challenging, which is of interest to SeSaMe and its researchers.
LEADER is a prototype to support probabilistic path prediction in dynamic environments. The core of our system is a “semi-lazy” approach to probabilistic path prediction, which builds prediction model on the fly using historical trajectories that are selected dynamically based on the trajectories of target objects.
Cameras represent one of the most utilized physical sensors to monitor our world, but are also the main contributor to the phenomenon of Big Data. Furthermore, the level of detail provided by cameras often raises privacy concerns. CamTweet is a novel approach that introduces smart cameras that map visual data into higher-level concepts using customized classifiers. This avoids the bandwidth-hungry sending of raw camera feeds and intrinsically enables a much higher level of privacy preservation. Using additional light-weight processing on a camera, it only outputs information in case user-defined events occur. We make those low-volume, high-level information available by letting cameras directly tweet their outputs. With that, users or applications can "follow" cameras alongside other tweeting objects or human users, creating a novel type of social cyber-physical ecosystem to monitor the world.
F-formation Social Interaction
In the literature, social interaction analysis is regarded as one type of complex human activity analysis problem. Under general scenario, specific definitions must be provided in advance to detect the given social interaction type. Considering the unconstrained nature of social interactions, it is not feasible to enumerate all the possible types of ad-hoc social interactions. In this work, we propose an extended F-formation system for robust interaction and interactant detection. Differing from the existing works on human activity analysis, it utilizes the F-formation model from sociology literature that considers the spatial aspect of social interactions, which is easier to detect in the generic social interaction settings. In addition, we also modelled the temporal aspect of interaction. Our novel extended F-formation system employs a heat map based feature representation for each unique individual, namely Interaction Space (IS), to model their respective location, orientation, and temporal information. In our work, the individual's spatial location and orientation are detected with multiple Kinect depth sensors. Given the interaction space of all individuals at a given frame, we detect the interaction centers (i.e., o-space) and the respective interactants, as well as the location of the best view camera. The proposed temporal-encoded interaction space based approach is evaluated on both the synthetic data and real-world environment.
Saliency in Context (SALICON) is an ongoing effort that aims at understanding and predicting visual attention. We created a new psychophysical paradigm to collect large-scale human attentional data during natural explorations on images. With this paradigm, we built the SALICON dataset with 20,000 natural images, by crowdsourcing the data collection with Amazon Mechanic Turk (AMT). The SALICON dataset is by far the largest in both scale and context variability.
Crowd Level Estimation
A crowd level estimation technique has been developed to determine the level of crowdedness in a particular location. It is based on a foreground occupancy algorithm. It has been applied to a mobile app called NUS Foodie, which provides information about food outlets in NUS, enables the user to rate the outlets, allows the food stall owners to carry out promotions and most interestingly informs the user about the level of crowdedness in specific food outlets on campus.
Controllable Face Privacy
We present the novel concept of Controllable Face Privacy. Existing methods that alter face images to conceal identity inadvertently also destroy other facial attributes such as gender, race or age. This all-or-nothing approach is too harsh. Instead, we propose a flexible method that can independently control the amount of identity alteration while keeping unchanged other facial attributes. Our method is useful for nuanced face de-identification, in which only facial identity is altered, but others, such gender, race and age, are retained. These altered face images protect identity privacy, and yet allow other computer vision analyses, such as gender detection, to proceed unimpeded. Controllable Face Privacy is therefore useful for reaping the benefits of surveillance cameras while preventing privacy abuse. Our proposal also permits privacy to be applied not just to identity, but also to other facial attributes as well. Furthermore, privacy-protection mechanisms, such as k-anonymity, L-diversity, and t-closeness, may be readily incorporated into our method.
Automatic Person Analytics with Egocentric Vision
In this work, we built a prototype to analyze a video footage and generate a unique set of persons who appear in the video. Here, we first perform face detection and tracking in the entire video footage, where each candidate tracking output (denoted as face-track) consists of the faces of the same individual. Each face-track is geometrically normalized to an intermediate format with detected facial feature points, followed by a generic face-based quality assessment is employed to reject or select a subset of facial images for identification purposes. The size normalized and quality checked face-tracks are then represented with Locally Sparse Encoded Descriptor (LSED), where one face descriptor is generated for each face-track. Finally, graph-based clustering technique is applied over the face-tracks' descriptor to generate the unique person set. The upper body color attribute of each person is also utilized to enhance the quality of the unique person set. The sensors data, such as localization, is used to geo-tag each detected face track. The non-visual sensor data allows our application to improve the accuracy of vision algorithm, as well as to geo-tag the detected individual. Given a list of persons in the personal contact list, the system can automatically name the known individuals in the video, or generate suggestion for each detected individual. Unseen individual will be automatically listed (in map-view or list-view) for user to annotate.
Automatic Self-Quantification of Presentation with Egocentric Vision
In this work, we showcase a multi-sensor based self-quantification framework for individual presentation analysis using First-Person-View (FPV) devices. Given a video captured with FPV devices, such as Google Glass, we analyse the presenter's performance on four categories, i,e., vocal behaviour, body language, engagement with audience, and presentation state. For this work, we analyse the raw data from audio sensor, visual sensor, and motions sensors. In addition, we extend the framework to incorperate data from ambient sensors (such as RGB-D sensors or CCTV camera) and audiences' FPV devices. A visualized analytic feedback is automatically generated. Experiment results with a newly collected dataset and user study shows promising results for this work. Overall, this application aims to provide effective analysis of individual behaviour, where the initial prototype provide a good feasible study on a specific application. Future research direction include the extension to various unseen location, as well as other potential application scenario.