0%

Who am I


Your Image Description

2023.5 at XiaMen

About Me

  • Hello, my name is Wei Liu. Here are my Email, Github and Google Scholar.
  • I earned a Bachelor of Communication Engineering from BUPT in 2018.
  • I completed my Master of Computer Engineering at CIST@BUPT (Center of Intelligence Science and Technology) in 2021.
  • Currently, I am working at Tencent as an Applied Researcher, specializing in NLP for Recommendation and Advertising.

Experience

  • CIST@BUPT, 2018.9-2021.7
    • Under the supervision of Dr. Lei Li, my research focused on the development of a "Highly Summarised Abstractive Summarization System".
    • I found the "text degeneration problem" in seq2seq-based summarization models, where the text generated by abstractive summarization models is exactly the same as the extracted sentences from articles. To develop a truly "Highly Summarised" summarization model capable of summarizing, reorganizing, and compressing text, I investigated this problem from three perspectives:
      • model[1]: I introduced Determinantal Point Processes into Abstractive Summarization for the first time and improved the attention distribution in deep seq2seq models.
      • data[2]: I discovered that public summarization datasets contain subjective bias from manual labeling, which leads to the text degeneration problem under supervised settings.
      • task definition: I explored the influence of extract/generate proportion within the summarization task on the degeneration problem.
    • Additionally, I engaged in various workshops addressing real-world challenges in summarization, including Scientific Paper Summarization[3], Multi-lingual and Low-Resource Summarization[4] and Extreme Long Text Summarization[5].

  • Application Research Intern, Sina Weibo, 2019.6-2019.9
    • I implemented BERT for blog classification in real-world production environments.
    • I improved the two-tower models for keyword ranking.

  • Application Researcher (including internship), Tencent AIPD, 2020.5-2022.5
    • Mainly focused on strengthening NLU capabilities to enhance Tencent's news feed recommendations.
    • I reformulated the Keyphrase Ranking task as a Machine Reading Comprehension task, which better considers the context and leverages pre-trained language models to improve performance.
    • I developed a Keyphrase Prediction method based on a unified extract-and-generate architecture[6]. This approach combines extractive and generative tasks within a unified autoencoder and autoregressive language model, and it is the first to model the interaction between present and absent keyphrases.
    • I explored Controllable Keyphrase Generation as an early attempt in prompt engineering[7]. This system employs specified topic keywords and automatically constructs prompts for LLMs to generate keyphrases, adhering to safety and customization requirements.
    • Based on contrastive learning and information theory, I designed a fine-tuning schema that minimizes hallucination in LLM generated texts[8].

  • Application Researcher, Tencent MLPD, 2022.5-present
    • Concentrated on improving the accuracy of Tencent Advertisement Recommendation in the view of model-centric and data-centric.
    • Model-Centric: I developed various models to mine the relationship between user interests in the advertising field and Non-advertising fields (such as reading blogs, playing games, and watching videos). These models include graph embeddings, sequence modeling, and diffusion models. Additionally, I explored interaction-enhanced two-tower models to directly enhance CTR/CVR prediction accuracy in Tencent Ads.
    • Data-Centric: Focusing on data-centric approaches, I developed model-based (rather than heuristic) feature quantization techniques to enhance the stability and performance of features used in large-scale advertising recommendation systems. I am also exploring feature AutoML techniques to automate feature engineering for Tencent Ads.

Publications

  • My Google Scholar

  • [1] In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes
  • CoNLL 2019 Long Paper
  • Lei Li*, Wei Liu*, Marina Litvak, Natalia Vanetik, Zuying Huang
  • [code][paper]


  • [2] Subjective Bias in Abstractive Summarization
  • Arxiv Preprint
  • Lei Li*, Wei Liu*, Marina Litvak, Natalia Vanetik, Jiacheng Pei, Yinan Liu, Siya Qi
  • [code][paper]



  • [3] CIST@CLSciSumm-19: Automatic Scientific Paper Summarization with Citances and Facets
  • SIGIR 2019 workshop
  • Lei Li, Yingqi Zhu, Yang Xie, Zuying Huang, Wei Liu, Xingyuan Li, Yinan Liu
  • [paper]



  • [4] Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus
  • RANLP 2019 workshop
  • Wei Liu, Lei Li, Zuying Huang, Yinan Liu
  • [paper]




  • [5] CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization
  • EMNLP 2020 Workshop
  • Lei Li, Yang Xie, Wei Liu, Yinan Liu, Yafei Jiang, Siya Qi, Xingyuan Li
  • [paper]



  • [6] UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction
  • ACL 2021 Findings Long Paper
  • Huanqin Wu*, Wei Liu*, Lei Li, Dan Nie, Tao Chen, Feng Zhang, Di Wang
  • [code][paper]



  • [7] Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning
  • AAAI 2022 Long Paper
  • Huanqin Wu, Baijiaxin Ma, Wei Liu, Tao Chen, Dan Nie
  • [paper]



  • [8] CO2Sum:Contrastive Learning for Factual-Consistent Abstractive Summarization
  • Arxiv Preprint
  • Wei Liu, Huanqin Wu, Wenjing Mu, Zhen Li, Tao Chen, Dan Nie
  • [paper]



  • [9] A Multi-View Abstractive Summarization Model Jointly Considering Semantics and Sentiment
  • CCIS 2018 Long Paper
  • Moye Chen, Lei Li, Wei Liu
  • [paper]



  • [10] In Submission
  • Anonymous
  • Anonymous
  • Anonymous



  • [11] In Submission
  • Anonymous
  • Anonymous
  • Anonymous