Research

My current search is focused in e-commerce domain, in particular information discovery in E-commerce.

Information Discovery in E-commerce at JD.com Jun. 2016 – Present
To improve user experience and engagement in E-commerce portals, I am investigating the following topics:

1) Recommender systems in E-commerce, my investigation lies on two folds: 1) knowledge graph in e-commerce: relations among the entities, e.g. product, brand, store, etc., are built by mining the content and user behaviors. Then we are leveraging such relations to improve recommendation quality; 2) exploiting users’ behaviors in e-commerce: the majority of traditional recommender systems have focused on the macro interactions between users and items, i.e., the purchase history of a customer. However, within each macro interaction between a user and an item, the user actually performs a sequence of micro behaviors, which indicate how the user locates the item, what activities the user conducts on the item (e.g., reading the comments, carting, and ordering) and how long the user stays with the item. Such micro behaviors offer fine-grained and deep understandings about users and provide tremendous opportunities to advance recommender systems in e-commerce.

  • Meizi Zhou, Zhuoye Ding, Jiliang Tang, Dawei Yin, Micro Behaviors: A New Perspective in E-commerce Recommender Systems, In Proceedings of the 11th ACM Conference on Web Search and Data Mining (WSDM 2018), Los Angeles, California, USA, 2018.
  • Zihan Wang, Ziheng Jiang, Zhaochun Ren, Jiliang Tang, Dawei Yin, A Path-constrained Framework for Discriminating Substitutable and Complementary Products in E-commerce, In Proceedings of the 11th ACM Conference on Web Search and Data Mining (WSDM 2018), Los Angeles, California, USA, 2018. [Best Student Paper Award]

2) Reinforcement learning to rank/recommend. Recommendation/search is typically an interactive process: when a user making a request, system will return a list of results; the user then provide feedback, e.g. click, order, system gets updated and further provide another list of results. More importantly, existing supervised learning based methods mainly focus on optimizing users’ instant positive feedback, e.g. click, but neglect delay metrics, (e.g. dwell time with the service, revisit the service), since it is non trivial for supervised learning to optimizing the delay metrics. Reinforcement learning was born to maximize long term rewards. We are formulating such a setting into reinforcement learning framework. Some results are released as follow.

  • Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. Deep Reinforcement Learning for Page-wise Recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys 2018), Vancouver, Canada, Oct. 2018.
  • Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang and Dawei Yin. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2018), London, United Kingdom, Aug., 2018.

3) Dialogue system. In e-commerce domain, through a prepurchase chatbot, users are able to find their interested products in a interactive manner. Unlike keyword query in product search, chatbot is trying to understand users’ needs which are represented in natural language and provide freestyle interaction for users.

  • Hongshen Chen,Xiaorui Liu, Dawei Yin and Jiliang Tang. A Survey on Dialogue Systems: Recent Advances and New Frontiers. SIGKDD Explorations, 2018
  • Hongshen Chen, Zhaochun Ren, Jiliang Tang, Yihong Eric Zhao and Dawei Yin. Hierarchical Variational Memory Network for Dialogue Generation. In Proceedings of the 27th International World Wide Web Conference (WWW 2018), Lyon, France, 2018.
  • Xisen Jin, Wenqiang Lei, Hongshen Chen, Shangsong Liang, Zhaochun Ren, Yihong Eric Zhao and Dawei Yin. Explicit State Tracking with Semi-Supervision for Neural Dialogue Generation. In Proceedings of the 27th ACM Conference on Information and Knowledge Management (CIKM 2018), Lingotto, Turin, Italy, October 2018.
  • Wenqiang Lei, Xisen Jin, Min-Yen Kan, Zhaochun Ren, Xiangnan He and Dawei Yin. Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.
  • Liu Shuman, Hongshen Chen, Zhaochun Ren, Yang Feng, Qun Liu and Dawei Yin. Knowledge Diffusion for Neural Dialogue Generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018.

So far, I have investigated the following topics:

Yahoo Web Search Relevance Aug. 2013 – Jun. 2016

Lead the science efforts on Yahoo Search Core Relevance. Scope: query rewriting, ranking function and ranking features etc.

  •  Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin and Yi Chang, Learning to rewrite queries, In Proceedings of 25th ACM Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, IN, USA, Oct. 2016. [Full paper]
  • Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly Jr., Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, Jean-Marc Langlois, Yi Chang. Ranking Relevance in Yahoo Search. In Proceedings of the 22nd annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2016), San Francisco, CA, Aug. 2016. [Best Paper Award]
  • Yue Wang, Dawei Yin, Jie Luo, Pengyuan Wang, Makoto Yamada, Yi Chang, Qiaozhu Mei, Beyond Ranking: Optimizing Whole-Page Presentation To appear in Proceedings of the 9th ACM Conference on Web Search and Data Mining (WSDM 2016), San Francisco, US, Feb 2016. [Best Paper Award]
  • Changsung Kang, Dawei Yin, Ruiqiang Zhang, Nicolas Torzec, Jianzhang He, Yi Chang, Learning to rank related entities in web search. To appear in Neurocomputing.
  • Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Dawei Yin, Yi Chang. Learning Query and Document Relevance from Web-scale Click Graph. In Proceedings of the 39th Annual ACM SIGIR Conference on Research and Development in Infor- mation Retrieval (SIGIR 2016), Pisa, Tuscany, Italy, July, 2016.
  • Neil O’Hare, Paloma De Juan, Rossano Schifanella, Yunlong He, Dawei Yin, Yi Chang. Leveraging User Interaction Signals for Web Image Search. In Proceedings of the 39th Annual ACM SIGIR Conference on Research and Development in Infor- mation Retrieval (SIGIR 2016), Pisa, Tuscany, Italy, July, 2016.

CTR Prediction and Click Yields Optmization for Sponsored Search   May. 2012- Sep. 2013
Sponsored search has become a major business for today’s commercial search engines. A critical problem in sponsored search is to understand and predict the browsing and click behaviors of users. Some specific properties of sponsored search that are not applicable to organic search are rarely explored. We analyze several factors influencing the CTR from the perspective of context, including the number of displayed ads, the content of the ads, the relationship between the query and ads, and the mutual influences between ads. Based on our data analysis, we propose a novel Context-Aware Click Model for sponsored search. On the other hand, the pay-per-click (PPC) advertising model is widely used in sponsored search. The search engines try to deliver ads which can produce greater click yields (the total number of clicks for the list of ads per impression). The current ad-delivery strategy is a two-step approach, which firstly predicts individual ad CTR for the given query and then selects the ads with top predicted CTRs. We challenge the traditional strategy and propose a novel framework that could directly optimize click yields for lists of ads.

  • Dawei Yin*, Shike Mei*, Bin Cao, Jian-Tao Sun, Brian D. Davison, Exploiting Contextual Factors for Click Modeling in Sponsored Search. To appear in Proceedings of 7th ACM Conference on Web Search and Data Mining (WSDM 2014), New York, USA, Feb 2014.
  • Dawei Yin, Bin Cao, Jian-Tao Sun, Brian D. Davison, Estimating Ad Group Performance in Sponsored Search. To appear in Proceedings of 7th ACM Conference on Web Search and Data Mining (WSDM 2014), New York, USA, Feb 2014.

Generalized Latent Factor Model   May. 2011 – Sep. 2013
Standard approaches to model relational data focus on a single type of relation. Matrix factorization to model binary relations and tensor factorization for higher order interactions are simple models with competitive generalization performances. In this project, a joint probabilistic model to handle multiple types of relations is proposed; it can be viewed as a direct generalization of existing co-factorization approaches. Compared to standard factor analysis approaches applied to each relation separately, it provides a novel way of regularizing sparse relational data by allowing the latent factors to be jointly estimated on multiple types of relations. A simple, generic and scalable algorithm based on stochastic gradient descent is proposed to estimate the parameters of the model. One motivating example is the social tagging of items where the joint factorization provides a simple way of recommending customized tags based on three different data sources: the previous tags, the image features and the social network.

  • Dawei Yin, Shengbo Guo, Boris Chidlovskii, Brian Davison, Cedric Archambeau and Guillaume Bouchard. Bayesian Multi-Relational Data Analysis. [Under review, draft available upon request]
  • Guillaume Bouchard, Dawei Yin and Shengbo Guo. Convex Collective Matrix Factorization. To appear in Proceedings of Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2013), Scottsdale, AZ, USA.
  • Dawei Yin, S. Guo, B. Chidlovskii, B. Davison, C. Archambeau and G. Bouchard, Connecting Comments and Tags: Improved Modeling of Social Tagging Systems. In Proceedings of 6th ACM Conference on Web Search and Data Mining (WSDM 2013), Rome, Italy, Feb 2013.

Understanding, Prediction in Micro-blog Media   Feb. 2010 – Sep. 2013
Unlike a traditional social network service, a micro-blogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and to predict new links are important for many tasks such as friend recommendation, community detection, and network growth models. We compare most popular and recent methods and principles for link prediction and note that the link prediction problem in a hybrid network is different from previously studied networks, such as co-authorship networks and traditional online social networks. In this project, we propose novel structural methods to calculate the probability of a link being created by examining the current user’s local network structure.

  • Dawei Yin, Liangjie Hong and Brian D. Davison, Structural Link Analysis and Prediction in MicroBlogs. In Proceedings of 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, Scotland, UK, October 2011.
  • Dawei Yin, Liangjie Hong, Xiong Xiong and Brian D. Davison, Link Formation Analysis in MicroBlogs. In Proceedings of the 34th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), Beijing, China, July, 2011.
  • Zhenzhen Xue, Dawei Yin and Brian D. Davison. Normalizing Microtext. In Proceedings of the AAAI-11 Workshop on Analyzing Microtext, San Francisco, USA, 2011.

Social Annotation, Bookmarking System   Feb. 2010 – Sep. 2013
In social bookmarking systems, it is critical to provide predictions or recommendations of what tags users might like to use, helping them to organize their resources and build a knowledge base. Existing methods in tag prediction have shown that the performance of prediction can be significantly improved by modeling users’ preferences. However, these preferences are usually treated as constant over time, neglecting the temporal factor within users’ interests. We systematically investigate the temporal dynamics of user interests in tagging systems and propose a user-tag specific temporal interests model for tracking users interests over time. We associate each user and tag pair with a kernel function to characterize their temporal changes and show an effective estimation process to embed this idea into state-of-the-art tag prediction algorithms.

  • Dawei Yin, Liangjie Hong, Zhenzhen Xue and Brian D. Davison, Temporal Dynamics of User Interests in Tagging Systems. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011), San Francisco, USA, 2011.
  • Dawei Yin, Liangjie Hong and Brian D. Davison. Exploiting Session-like Behaviors in Tag Prediction. In Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India, March 2011.
  • Dawei Yin, Zhenzhen Xue, Liangjie Hong and Brian D. Davison. A Probabilistic Model for Personalized Tag Prediction. In Proceedings of the 16th annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2010), Washington, DC, July 2010.
  • Xiaoguang Qi, Dawei Yin, Zhenzhen Xue and Brian D. Davison. (2010) Choosing Your Own Adventure: Automatic Taxonomy Generation to Permit Many Paths. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM2010), Toronto, Canada, October 2010.
  • Xiaoguang Qi, Dawei Yin, Zhenzhen Xue and Brian D. Davison, Enhancing Taxonomies by Providing Many Paths. Technical Report LU-CSE-10-005, Dept. of Computer Science and Engineering, Lehigh , 2010

Fast approximate nearest-neighbor classifiers   Feb. 2010 – Sep. 2011
Scaling up document-image classifiers to handle an unlimited variety of document and image types poses serious challenges to conventional trainable classifier technologies. Highly versatile classifiers demand representative training sets which can be dauntingly large. In this project, we propose an algorithm, which we call online bin-decimation, for coping with training sets that are too big to fit in main memory. The key idea of bin-decimation is to enforce an upper bound approximately on the number of training samples stored in each K-d hash bin; an adaptive statistical technique allows this to be accomplished online and in linear time, while reading the training data exactly once. This project is a part of Lehigh Document Analysis and Exploitation.

  • Dawei Yin, Chang An, Henry S. Baird. Imbalance and Concentration in k-NN Classification. In Proceedings of IAPR 20th International Conference on Pattern Recognition (ICPR 2010), Istanbul, Turkey, August 2010.
  • Chang An, Dawei Yin, Henry S. Baird. Document Segmentation using Pixel-Accurate Ground Truth. In Proceedings of IAPR 20th International Conference on Pattern Recognition (ICPR 2010), Istanbul, Turkey, August 2010.
  • Dawei Yin, Chang An and Henry S. Baird. Safely Selecting Subsets of Training Data. In Proceedings of IAPR 9th International Workshop on Document Analysis Systems (DAS 2010), Boston, MA, June 2010.
  • Dawei Yin, Henry S. Baird and Chang An. (2010) Time and Space Optimization of Document Content Classiers. In Proceedings of IS&T/SPIE Document Recognition and Retrieval Conf. (DR&R XVII), San Jose, CA, January 17-21, 2010.