Nan Li

Hey.

Welcome to Nan's homepage! I am currently a Machine Learning Engineer on the Relevance & Personalization team at Airbnb, building deep neural models to improve Airbnb's homes search ranking. Prior to Airbnb, I was a Research Scientist on multiple ML teams at Facebook, where my work spanned across conversational automation using NLP and statistical modeling, optimizing human labeling workflows using generative models and model-assisted sampling, and various other predictive modeling projects. Before Facebook, I was a data scientist in the Applied Machine Learning group at Apple and on the Data Products and Research team at Upwork.
I received my Ph.D. in Computer Science from University of California, Santa Barbara, where I did research in data mining and applied ML. While studying in the beautiful Santa Barbara, I was blessed with internship opportunities at Microsoft Research (Cambridge, UK), IBM Research and Microsoft Bing.
My latest CV is here.

Research Interests

My research interests lie in the general topics of machine learning and data mining. I build a variety of machine learning models to solve interesting data problems, such as learning embeddings, search and ranking, information retrieval and recommender systems, and natural language processing. In my spare time, I enjoy learning about neural networks and deep learning. Some side projects on neural networks I contributed to can be found here.

My projects at Facebook have been in the realm of general applied machine learning, including ranking systems, embedding models, text classification, video feature extraction and sequencing, spatial clustering and image contour detection.

In the past, I worked on various regression/classification algorithms along with feature engineering and regularization mechanisms. I also had experiences in topic modeling, regularized mixture models, graphical models, inference, Bayesian online learning and reinforcement learning through function approximation.

When I was a Ph.D. student, I designed probabilistic models and combinatorial algorithms for solving graph problems, ranging from graph indexing and querying, to graph anomaly detection. More specifically, I have worked on label-based subgraph matching via density indexing, attribute proximity computation using personalized PageRank aggregation, and vertex classification through graph augmentation and random walks. My research focus later switched to applying statistical modeling and applied machine learning for graph problems, such as using a regularized mixture model to uncover anomalous regions of a vertex-attributed graph in a principled manner.

Please visit my Google Scholar profile for all my publications.

Ph.D. Dissertation

Nan Li, "Uncovering interesting attributed anomalies in large graphs", [pdf].

Conference Papers

Nan Li, Huan Sun, Kyle Chipman, Jemin George, Xifeng Yan, "A Probabilistic Approach to Uncovering Attributed Graph Anomalies", Proc. of the 2014 SIAM International Conference on Data Mining (SDM'14),Philadelphia, PA, Apr. 24-26, 2014, pp. 82-90, [pdf].

Nan Li, Ziyu Guan, Lijie Ren, Jian Wu, Jiawei Han, Xifeng Yan. "gIceberg: towards iceberg analysis in large graphs", Proc. of the 2013 IEEE International Conference on Data Engineering (ICDE'13), Brisbane, Australia, Apr. 8-12, 2013, pp. 1021-1032, [pdf].

Nan Li, Xifeng Yan, Zhen Wen, Arijit Khan, "Density index and proximity search in large graphs", Proc. of the 2012 ACM International Conference on Information and Knowledge Management (CIKM'12), Maui, HI, USA, Oct. 29-Nov. 2, 2012, pp. 235-244, [pdf].

Arijit Khan, Nan Li, Xifeng Yan, Ziyu Guan, Supriyo Chakraborty and Shu Tao, "Neighborhood based fast graph search in large networks", Proc. of the 2011 International Conference on Management of Data (SIGMOD'11), Athens, Greece, Jun. 12-16, 2011, pp. 901-912.

Charu Aggarwal, Nan Li, "On node classification in dynamic content-based networks", Proc. of the 2011 SIAM International Conference on Data Mining (SDM'11),Phoenix, AZ, USA, Apr. 28-30, 2011, pp. 355-366 [pdf].

Nan Li, Naoki Abe, "Temporal cross-sell optimization using action proxy-driven reinforcement learning", ICDM 2011 Workshop on Optimization BasedMethods for Emerging Data Mining Problems (ICDMW’11), Vancouver, Canada, Dec. 2011, pp. 259-266 [pdf].

Nan Li, Yinghui Yang, Xifeng Yan, "Cross-selling optimization for customized product promotion", Proc. of the 2010 SIAM International Conference on Data Mining (SDM'10), Columbus, OH, USA, Apr. 29-May 1, 2010, pp. 918-929 [pdf].

Journal Papers

Charu Aggarwal, Nan Li, "On supervised mining of dynamic content-based networks", Statistical Analysis and Data Mining, Vol. 5, No. 1, 2012, pp. 16-34 [pdf].

Nan Li, Desheng Dash Wu, "Using text mining and sentiment analysis for online forums hotspot detection and forecast", Decision Support Systems, Vol. 48, No. 2, 2010, pp. 354-368 [pdf].

Nan Li, Xun Liang, Xinli Li, Chao Wang, Desheng Dash Wu, "Network environment and financial risk using machine learning and sentiment analysis", Hum. Ecol. Risk Assess., Vol. 15, No. 2, 2009, pp. 227-252 [pdf].

My past projects include creating probabilistic models and algorithms to solve various graph mining problems, business optimization and customer lifetime value modeling, Bayesian online learning for user ranking, and so on. Here below are those I find particularly interesting:

Uncovering subgraphs with abnormal distributions of attributes reveals important insight into network behaviors. In this paper, we introduce a regularized mixture model to identify anomalous subgraphs exhibiting significantly different distributions of a certain vertex attribute, compared to the rest of the graph. Our framework, gAnomaly, models the generative process of vertex attributes and divides the graph into regions that are governed by the background and anomaly distributions.

In this paper, we introduce the concept of graph icebergs that refer to vertices for which the concentration (aggregation) of an attribute in their vicinities is abnormally high. Intuitively, these vertices are “close” to the attribute of interest with respect to the network structure. We propose a novel framework, called gIceberg, which performs aggregation over personalized PageRank vectors. gIceberg then ranks the vertices by their aggregate scores.

Given a large graph where vertices are associated with labels, how do we quickly find interesting vertex sets according to a given query? In this paper, we study label-based proximity search, which finds the top-k query-covering vertex sets with the smallest diameters. We propose a novel framework, called gDensity, which uses density index and likelihood ranking to find vertex sets in an efficient and accurate manner.

User skill ratings have important purposes in real-world competitive games. Previous authors have explored various statistical approaches to assess user skills. In this work, we propose a probabilistic online inference framework to iteratively rate user skills in a crowdsourcing environment. Our framework achieves the following goals: 1) estimate user skills iteratively with certain confidence; 2) match users with similar-level competitors; 3) predict competitive outcomes.

Customer lifetime value modeling and cross-selling pattern mining are two important areas of data mining applications in marketing sciences. In this paper, we propose a novel approach that can address both of these problems in a unified manner. We propose a variant of reinforcement learning, enhanced with the notion of "action proxy", which is applicable to the cross-selling pattern discovery even in the absence of actions. The motivation is to optimize the target values of immediate rewards to maximize the expected overall long-term reward.

Say Hello.

Send me an email by clicking the button below, or add me on LinkedIn!

Hey.

Research Interests

Selected Publications

Ph.D. Dissertation

Archive

Conference Papers

Journal Papers

Some Cool Projects I Worked on at UCSB

Say Hello.