Tutorial 1: Natural Language Processing for Health Informatics

Author: Sandya Mannarswamy, Saravanan Chidambaram

Text data in health informatics originate from a wide variety of heterogeneous data sources which include medical literature, Electronic Medical Records (EMRs), insurance claims data, mobile health applications data, social media etc. just to name a few. Mining these diverse textual data sources for actionable insights is the holy-grail of health informatics systems. The extremely fragmented nature of the Healthcare eco-system and its diverse participants has given rise to widely different vocabularies of the different textual artefacts.
Given the vast differences in the vocabulary/semantics among these three sources of medical text, different NLP techniques are needed to analyse and understand the information from them. Given that any health care informatics solution needs to understand and analyse these different artefacts brings forth the requirement that these diverse textual sources needs to be unified and analysed for concepts and information. Hence there has been considerable research in applying state of the art Natural Language Processing (NLP) techniques to mining of diverse health textual data. The proposed tutorial is intended to provide an overview of the state of the art NLP techniques for healthcare text mining.
The tutorial is intended to highlight the vast opportunities for NLP researchers in this area which has not only academic research impact, but considerable social impact and relevance. We will review challenging NLP problems in the area of healthcare informatics, discuss state-of-the art techniques, and important applications, as well as datasets, medical resources, and practical issues.
First part of the tutorial will focus on techniques and applications of healthcare text mining including text mining of biomedical research literature, clinical notes and social media healthcare data. Second part of the tutorial will focus on case studies to enable the audience to appreciate the applicability of health care text mining in real life settings. We will also cover existing tools which aid in healthcare text mining and datasets which researchers can use for their research.
Prerequisite Knowledge: the tutorial does not require any prior knowledge in healthcare for the audience. The ultimate goal is to reduce the entry barrier for NLP/ML researchers to contribute to this exciting area.


Sandya Mannarswamy is a Research Scientist in Conduent Labs India (formerly known as Xerox Research Centre) in the Text and Graph Analytics research group. She is currently working on healthcare text analytics. Her research interests span natural language processing, text mining, and high performance computing. Over the years, her research has resulted in a number of patent disclosures, and publications in premier ACM/IEEE conferences. She has been successfully organizing the Health Data Management and Mining Workshop (HDMM) at International Conference on Data Engineering for the last two years.
Saravanan Chidambaram leads the Research and development section in the Mission Critical Systems group of Hewlett Packard Enterprise (HPE) focusing on platforms development and operating systems. His research interests span across Natural language processing, High performance computing and software stacks for enabling newer Scale up Supercomputing Platforms. Over a career spanning 16 years, at various R&D labs, including Hewlett Packard Ltd, Microsoft Inc and Oracle, he has led the development of many research and development projects in the areas of virtualization, compilers and programming languages and big data.

Tutorial 2: Deep Learning for Biomedical Discovery and Data Mining

Author: Truyen Tran

The goals of this tutorial are to provide the general PAKDD audience with knowledge and materials about a great venture for KDD research – the intersection between deep learning and biomedicine and to provide the deep learning community with relatively new, high impact research problems within biomedicine.
The tutorial introduces the state of the field for deep learning, and argues how biomedicine is an ideal data–intensive domain. It gives a brief review of deep learning, covering classic neural architectures including feedforward, recurrent and convolutional nets and more advanced topics including CapsNet, powerful memory-augmented neural nets (MANN), as well as models for graph data.
Two major subtopics of Genomics are covered: nanopore sequencing (which is about converting electrical signals into DNA character sequences), and genomics modeling ( which is about making sense of the DNA sequences for multiple biological processes).
Introducing biomedical imaging briefly covers imaging modalities, including vision-based, sound-based and EEG/ECG-based technologies. Then we show how deep learning technologies are being adapted, sometimes achieving human-level accuracy.
For healthcare coverage is on data mining of Electronic Medical Records. Two main problems are considered: The first is modeling time-series of physiological measurements and the second is mid-term health trajectories prediction.
Generative biomedicine section presents the recent advances in few-shot learning and deep generative models (DBN/DBM, VAE and GAN). This describes how to apply these advances to drug designs, and the future outlook into a 5 years horizon and beyond on the joint venture of deep learning and biomedicine.
Prerequisite Knowledge: the tutorial does not require detailed prior knowledge of biomedicine or deep learning, but basic familiarity with machine learning is assumed.


Truyen Tran is a Senior Lecturer at Deakin University where he leads a research team on deep learning and its applications to accelerating sciences, biomedicine and software analytics at Centre for Pattern Recognition and Data Analytics. He publishes regularly at top AI/ML/KDD venues such as CVPR, NIPS, UAI, AAAI, KDD and ICML. Tran has received multiple recognition, awards and prizes including Best Paper Runner Up at UAI (2009), Geelong Tech Award (2013), CRESP Best Paper of the Year (2014), Third Prize on Kaggle Galaxy-Zoo Challenge (2014), Title of Kaggle Master (2014), Best Student Papers Runner Up at PAKDD (2015) and ADMA (2016), Distinguished Paper at ACM SIGSOFT (2015), and Deakin Thought Leader (2016). He obtained a Bachelor of Science from University of Melbourne and a PhD in Computer Science from Curtin University in 2001 and 2008, respectively.

Tutorial 3: Non-IID Recommender Systems in Practice with Modern AI Techniques

Author: Liang Hu, Longbing Cao, Songlei Jian

The renaissance of artificial intelligence (AI) has attracted huge attention from every corner of the world. Specially, machine learning approaches have deeply involved in AI research in almost all areas, e.g., natural language processing (NLP), computer vision (CV). In particular, recommender systems (RS), as probably one of the most widely used AI systems, has integrated into every part of our daily life. In this AI age, state-of-the-art machine learning approaches, e.g. deep learning, have become the primary choice to model advanced RSs.
Classic RSs are built on the assumption that the relevant data, e.g. ratings, contents and/or social relations, are independent and identical distributed (IID). Intuitively, this is inconsistent with real-life data characteristics, and cannot represent the heterogeneity and coupling relationships over relevant data. Therefore, we employ modern machine learning approaches to enhance RSs with Comprehensive, Complementary, and Contextual (3C) information by coupling relevant heterogeneous data.
This tutorial will analyze data, challenges, and business needs in advanced recommendation problems, and take non-IID perspective to introduce recent advances in machine learning to model the 3C-based RSs. This includes an overall of RS evolution and non-IIDness in recommendation, advanced machine learning for cross-domain RS, social RS, multimodal RS, multi-criteria RS, context-aware RS, and group-based RS, and their integration in building real-life RS.
The goal of this tutorial aims to enable both academic and practical audience with a comprehensive understanding and relevant techniques of how to apply state-of-the-art machine learning approaches to build more sensible next-generation RSs in contexts with various heterogeneous data and complex relations. In this tutorial, we will present a systematic review and applications of recent advanced machine learning techniques to build real-life intelligent RSs.
Prerequisite Knowledge: a rudimentary knowledge of RSs and some machine learning methods will be helpful, including: (1) Recommender systems; (2) Latent factor models and (3) Deep learning models


Liang Hu received his first Ph.D. degree in computer application technology with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China in 2015. The title of his dissertation is Research on Modeling Approaches to Recommender Systems by Exploiting Multi-Information. Currently, he is a Ph.D. candidate major in Analytics with Advanced Analytics Institute, University of Technology Sydney, Australia. His research interests include recommender systems, data mining, machine learning and general artificial intelligence. He has published a number of papers in top-rank international conferences and journals in the area of recommender systems, including WWW, IJCAI, AAAI, ICDM, ICWS, TOIS, JWSR. He serves as program committee member on more than 10 top conferences, including IJCAI, AAAI, ICDM, CIKM.
Longbing Cao is a professor of information technology at the University of Technology Sydney (UTS), Australia. He is the Founding Director of Advanced Analytics Institute at UTS. He is the Chair of ACM SIGKDD Australia and New Zealand Chapter, IEEE Task Force on Data Science and Advanced Analytics, and IEEE Task Force on Behavioral, Economic and Socio-cultural Computing. He serves as conference co-chair of KDD2015, PAKDD13 and ADMA13, and program co-chair or vice-chair of PAKDD17, PAKDD11, ICDM10 etc., and area chair or (senior) program committee member on around 100 conferences including KDD, AAAI, IJCAI, ICDM and AAMAS. His primary research interests include data science and mining, machine learning, behavior informatics, agent mining, multi-agent systems, and open complex intelligent systems. He is currently dedicated to the research on non-iid learning in big data and behavior informatics which involve very wide enterprise applications. He has successfully delivered 11 tutorials including to IJCAI and CIKM and dozens of invited talks to main conferences/workshops and public seminars to industry and government.
Songlei Jian is a joint Ph.D. student with the Advanced Analytics Institute, University of Technology Sydney (UTS) and the National University of Defense Technology (NUDT). Her research interests include machine learning, recommender systems, network modeling, and representation learning. She has published a number of papers in top-rank international conferences and journals in the area of data mining, machine learning, and recommender systems, including IJCAI, AAAI, TKDE. She has served the community as program committee member or reviewer of AAAI, KDD, ICDM, and TKDE.