Please read Copyright and Disclaimer notice
## Selected PapersH.H. Bui, Plan recognition is the problem of inferring an actor’s plan by watching the actor’s actions and their effects. Our work was the first to model this hierarchical plan structure, and make inferences at different levels of abstraction in the plan hierarchy. We introduced the Abstract Hidden Markov Model (AHMM), a novel type of stochastic process, provided its dynamic Bayesian network (DBN) structure, and analysed the properties of this network. H. Bui, D. Phung, and This work was the first to tackle algorithmic intractability in dealing with complex graphical models. It provided efficient inference through an exact Asymmetric Inside-Outside inference algorithm for hierarchical hidden Markov models. This algorithm was computationally efficient as it scaled linearly in the depth of the hierarchy. This class of inference algorithms also generalises inference in probabilistic context free grammars in natural language processing. This was later extended to derive the first exact and tractable inference algorithm for the hierarchical conditional random fields. T. V. Duong, H. H. Bui, D. Q. Phung, In 2005, semi-hidden Markov models provided a rudimentary form of duration modelling. But they were computationally inefficient, as in the absence of knowing how long a state can last, all possibilities need to be accounted for. Our work proposed an innovative solution to this problem drawing upon the theoretical work in phase-type modelling. We incorporated the discrete Coxian distribution into the semi-Markov model and constructed efficient inference. This model was versatile and could approximate any duration distribution. It required minimal prior information - only the number of phases had to be specified. Our Coxian hidden semi-Markov model was as fast as the conventional Hidden Markov Models, and could additionally provide richer modelling of explicit duration distributions. S. K. Gupta, D. Phung, B. Adams, T. Tran, Matrix factorization is a popular method in the machine learning toolbox. It has remained the popular method for many practical applications since the work of Daniel Lee and Sebastian Seung (Learning the parts of objects by non-negative matrix factorization, Nature, 401, 788—791,1999). However, prior to our work, Matrix factorization methods were developed only for a single data source. Real-world applications, in particular with the rise of big data, involve many data sources. Such sources are often correlated and interact. This work was the first to provide a principled matrix factorization and algebra to jointly factorize data matrices. The critical property of our method is that it discovers the shared information among multiple data sources, whilst preserving the individual information from each data domain. V. Nguyen, D. Phung, X. Nguyen, Bayesian nonparametric methods find solutions to tackle the automatic model-selection problem in statistics and machine learning. Such automatic model selection is critical because without this, users face the daunting task of manually finding the best algorithmic parameters. This work examined the joint modelling of both context (say location) and content (say text of webpages) of data in a Bayesian nonparametric setting. If this coupling can be performed in a principled way, we can effectively borrow the statistical strength provided by context to improve the inference of observations, or to extrapolate to new settings. B. Adams, C. Dorai, and We pioneered a new way of analysing multimedia, motivated and directed by cinematic conventions used by film directors. We called the field “computational media aesthetics”. This was the first work to apply this new way of thinking to meaningfully segment video, and index and extract abstractions, directly addressing the challenge of bridging the semantic gap that exists between the simplicity of features that can be computed in automated content indexing systems and the richness of semantics in user queries posed for media search and retrieval. It proposed a unique computational approach to extraction of expressive elements of motion pictures for deriving high-level semantics of stories portrayed, thus enabling rich video nnotation and interpretation. It used film grammar as a first step toward demonstrating its effectiveness, and used the attributes of motion and shot length to define and compute a novel measure of “tempo” of a movie. Tempo flow plots were derived for a number of full-length movies and edge analysis performed leading to the extraction of dramatic story sections and events signalled by their unique tempo. The results confirmed tempo as a useful high-level semantic construct in its own right and a promising component of others such as rhythm, tone or mood of a film. B. T. Truong, The work provided the field of multimedia with the earliest systematic review of video abstraction. It was pioneering in its manner of classifying available methods for video abstraction, a fundamental problem in video indexing and retrieval. This technology is core to many commonplace tasks such as video search in Google or Youtube. At that time, it was an emerging field, and many methods were developing quickly. It was difficult to know what should be the benchmark for new techniques. We solved this problem by proposing the first systematic framework and the most comprehensive review of the field. It focused strongly on identifying critical aspects of video abstraction, ranging from problem formulation to result evaluation, analyzing and classifying how these are addressed in various works.
This work is the first of its kind and is still considered as ground-breaking and pioneering in early intervention in autism using technology. The work was published in SiGCHI, the top rated computer science conference on human computer interfaces with a h5 index of 84, and is testimony to its novel academic content. It was the first to formulate the computational problem of early intervention and then to translate it into rigorous algorithms and methods for delivering such programs to children as young as 2. The work encompassed all aspects of early teaching: a) a syllabus, ordered by complexity in 5 early skill development areas - visual/audio sensory matching, social skills, gross motor skills, expressive and receptive language; b) a library of reusable multimedia resources to provide content for this syllabus; and, c) solutions to adapt to each child’s skills and to teach parents unfamiliar concepts core to behavioural therapy, such as prompting and fading. T. Tran, W. Luo, D. Phung, R. Harvey, M. Berk, R. L. Kennedy, Rather than predicting suicide risk, we decided to stratify risk. Our contention was that predicting the riskiest patients would be valuable. We departed from using routinely collected risk assessment data and used Electronic Medical records instead. We solved the problem using novel feature engineering and ordinal classification. Using a large cohort of patients we were able to show that our predictions were at least twice as good as clinicians. We also presented this work at the top data mining conference, The 19th ACM SIGKDD (The Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining) International Conference in 2013. S. Greenhill, Conventional wide-area video surveillance systems use a network of fixed cameras positioned close to locations of interest. We proposed a radically different approach to wide area surveillance based on observation streams collected from mobile cameras mounted on buses. We allowed a “virtual observer” to be placed anywhere within the space covered by the sensor network, and to reconstruct the scene at these arbitrary points. Use of such imagery is challenging because mobile cameras have variable position and orientation, and sample a large spatial area but at low temporal resolution. Additionally, the views of any particular place are distributed across many different video streams. Addressing this problem, we presented a system in which views from an arbitrary perspective can be constructed by indexing, organising, and transforming images collected from multiple streams acquired from a network of mobile cameras. Our system supported retrieval of raw images based on constraints of space, time, and geometry (e.g. visibility of landmarks). It also allowed the synthesis of wide-angle panoramic views in situations where the camera motion produces suitable sampling of the scene and metaphors for query and presentation that overcome the complexity of the data. |