Research

So far, I have investigated the following several topics:

Yahoo Web Search Relevance Aug. 2013 – Present
Lead the science efforts on Yahoo Search Core Relevance. Scope: query rewriting, ranking function and ranking features etc.

  •  Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin and Yi Chang, Learning to rewrite queries, In Proceedings of 25th ACM Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, IN, USA, Oct. 2016. [Full paper]
  • Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly Jr., Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, Jean-Marc Langlois, Yi Chang. Ranking Relevance in Yahoo Search. In Proceedings of the 22nd annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2016), San Francisco, CA, Aug. 2016. [Best Paper Award]
  • Yue Wang, Dawei Yin, Jie Luo, Pengyuan Wang, Makoto Yamada, Yi Chang, Qiaozhu Mei, Beyond Ranking: Optimizing Whole-Page Presentation To appear in Proceedings of the 9th ACM Conference on Web Search and Data Mining (WSDM 2016), San Francisco, US, Feb 2016. [Best Paper Award]
  • Changsung Kang, Dawei Yin, Ruiqiang Zhang, Nicolas Torzec, Jianzhang He, Yi Chang, Learning to rank related entities in web search. To appear in Neurocomputing.
  • Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly, Dawei Yin, Yi Chang. Learning Query and Document Relevance from Web-scale Click Graph. In Proceedings of the 39th Annual ACM SIGIR Conference on Research and Development in Infor- mation Retrieval (SIGIR 2016), Pisa, Tuscany, Italy, July, 2016.
  • Neil O’Hare, Paloma De Juan, Rossano Schifanella, Yunlong He, Dawei Yin, Yi Chang. Leveraging User Interaction Signals for Web Image Search. In Proceedings of the 39th Annual ACM SIGIR Conference on Research and Development in Infor- mation Retrieval (SIGIR 2016), Pisa, Tuscany, Italy, July, 2016.

CTR Prediction and Click Yields Optmization for Sponsored Search   May. 2012- Sep. 2013
Sponsored search has become a major business for today’s commercial search engines. A critical problem in sponsored search is to understand and predict the browsing and click behaviors of users. Some specific properties of sponsored search that are not applicable to organic search are rarely explored. We analyze several factors influencing the CTR from the perspective of context, including the number of displayed ads, the content of the ads, the relationship between the query and ads, and the mutual influences between ads. Based on our data analysis, we propose a novel Context-Aware Click Model for sponsored search. On the other hand, the pay-per-click (PPC) advertising model is widely used in sponsored search. The search engines try to deliver ads which can produce greater click yields (the total number of clicks for the list of ads per impression). The current ad-delivery strategy is a two-step approach, which firstly predicts individual ad CTR for the given query and then selects the ads with top predicted CTRs. We challenge the traditional strategy and propose a novel framework that could directly optimize click yields for lists of ads.

  • Dawei Yin*, Shike Mei*, Bin Cao, Jian-Tao Sun, Brian D. Davison, Exploiting Contextual Factors for Click Modeling in Sponsored Search. To appear in Proceedings of 7th ACM Conference on Web Search and Data Mining (WSDM 2014), New York, USA, Feb 2014.
  • Dawei Yin, Bin Cao, Jian-Tao Sun, Brian D. Davison, Estimating Ad Group Performance in Sponsored Search. To appear in Proceedings of 7th ACM Conference on Web Search and Data Mining (WSDM 2014), New York, USA, Feb 2014.

Generalized Latent Factor Model   May. 2011 – Sep. 2013
Standard approaches to model relational data focus on a single type of relation. Matrix factorization to model binary relations and tensor factorization for higher order interactions are simple models with competitive generalization performances. In this project, a joint probabilistic model to handle multiple types of relations is proposed; it can be viewed as a direct generalization of existing co-factorization approaches. Compared to standard factor analysis approaches applied to each relation separately, it provides a novel way of regularizing sparse relational data by allowing the latent factors to be jointly estimated on multiple types of relations. A simple, generic and scalable algorithm based on stochastic gradient descent is proposed to estimate the parameters of the model. One motivating example is the social tagging of items where the joint factorization provides a simple way of recommending customized tags based on three different data sources: the previous tags, the image features and the social network.

  • Dawei Yin, Shengbo Guo, Boris Chidlovskii, Brian Davison, Cedric Archambeau and Guillaume Bouchard. Bayesian Multi-Relational Data Analysis. [Under review, draft available upon request]
  • Guillaume Bouchard, Dawei Yin and Shengbo Guo. Convex Collective Matrix Factorization. To appear in Proceedings of Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2013), Scottsdale, AZ, USA.
  • Dawei Yin, S. Guo, B. Chidlovskii, B. Davison, C. Archambeau and G. Bouchard, Connecting Comments and Tags: Improved Modeling of Social Tagging Systems. In Proceedings of 6th ACM Conference on Web Search and Data Mining (WSDM 2013), Rome, Italy, Feb 2013.

Understanding, Prediction in Micro-blog Media   Feb. 2010 – Sep. 2013
Unlike a traditional social network service, a micro-blogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and to predict new links are important for many tasks such as friend recommendation, community detection, and network growth models. We compare most popular and recent methods and principles for link prediction and note that the link prediction problem in a hybrid network is different from previously studied networks, such as co-authorship networks and traditional online social networks. In this project, we propose novel structural methods to calculate the probability of a link being created by examining the current user’s local network structure.

  • Dawei Yin, Liangjie Hong and Brian D. Davison, Structural Link Analysis and Prediction in MicroBlogs. In Proceedings of 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, Scotland, UK, October 2011.
  • Dawei Yin, Liangjie Hong, Xiong Xiong and Brian D. Davison, Link Formation Analysis in MicroBlogs. In Proceedings of the 34th Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), Beijing, China, July, 2011.
  • Zhenzhen Xue, Dawei Yin and Brian D. Davison. Normalizing Microtext. In Proceedings of the AAAI-11 Workshop on Analyzing Microtext, San Francisco, USA, 2011.

Social Annotation, Bookmarking System   Feb. 2010 – Sep. 2013
In social bookmarking systems, it is critical to provide predictions or recommendations of what tags users might like to use, helping them to organize their resources and build a knowledge base. Existing methods in tag prediction have shown that the performance of prediction can be significantly improved by modeling users’ preferences. However, these preferences are usually treated as constant over time, neglecting the temporal factor within users’ interests. We systematically investigate the temporal dynamics of user interests in tagging systems and propose a user-tag specific temporal interests model for tracking users interests over time. We associate each user and tag pair with a kernel function to characterize their temporal changes and show an effective estimation process to embed this idea into state-of-the-art tag prediction algorithms.

  • Dawei Yin, Liangjie Hong, Zhenzhen Xue and Brian D. Davison, Temporal Dynamics of User Interests in Tagging Systems. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011), San Francisco, USA, 2011.
  • Dawei Yin, Liangjie Hong and Brian D. Davison. Exploiting Session-like Behaviors in Tag Prediction. In Proceedings of the 20th International World Wide Web Conference (WWW 2011), Hyderabad, India, March 2011.
  • Dawei Yin, Zhenzhen Xue, Liangjie Hong and Brian D. Davison. A Probabilistic Model for Personalized Tag Prediction. In Proceedings of the 16th annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2010), Washington, DC, July 2010.
  • Xiaoguang Qi, Dawei Yin, Zhenzhen Xue and Brian D. Davison. (2010) Choosing Your Own Adventure: Automatic Taxonomy Generation to Permit Many Paths. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM2010), Toronto, Canada, October 2010.
  • Xiaoguang Qi, Dawei Yin, Zhenzhen Xue and Brian D. Davison, Enhancing Taxonomies by Providing Many Paths. Technical Report LU-CSE-10-005, Dept. of Computer Science and Engineering, Lehigh , 2010

Fast approximate nearest-neighbor classifiers   Feb. 2010 – Sep. 2011
Scaling up document-image classifiers to handle an unlimited variety of document and image types poses serious challenges to conventional trainable classifier technologies. Highly versatile classifiers demand representative training sets which can be dauntingly large. In this project, we propose an algorithm, which we call online bin-decimation, for coping with training sets that are too big to fit in main memory. The key idea of bin-decimation is to enforce an upper bound approximately on the number of training samples stored in each K-d hash bin; an adaptive statistical technique allows this to be accomplished online and in linear time, while reading the training data exactly once. This project is a part of Lehigh Document Analysis and Exploitation.

  • Dawei Yin, Chang An, Henry S. Baird. Imbalance and Concentration in k-NN Classification. In Proceedings of IAPR 20th International Conference on Pattern Recognition (ICPR 2010), Istanbul, Turkey, August 2010.
  • Chang An, Dawei Yin, Henry S. Baird. Document Segmentation using Pixel-Accurate Ground Truth. In Proceedings of IAPR 20th International Conference on Pattern Recognition (ICPR 2010), Istanbul, Turkey, August 2010.
  • Dawei Yin, Chang An and Henry S. Baird. Safely Selecting Subsets of Training Data. In Proceedings of IAPR 9th International Workshop on Document Analysis Systems (DAS 2010), Boston, MA, June 2010.
  • Dawei Yin, Henry S. Baird and Chang An. (2010) Time and Space Optimization of Document Content Classiers. In Proceedings of IS&T/SPIE Document Recognition and Retrieval Conf. (DR&R XVII), San Jose, CA, January 17-21, 2010.