About Me
- Hello, my name is Wei Liu. Here are my Email, Github and Google Scholar.
- I earned a Bachelor of Communication Engineering from BUPT in 2018.
- I completed my Master of Computer Engineering at CIST@BUPT (Center of Intelligence Science and Technology) in 2021.
- Currently, I am working at Tencent as an Applied Researcher, specializing in NLP for Recommendation and Advertising.
Experience
- CIST@BUPT, 2018.9-2021.7
- Under the supervision of Dr. Lei Li, my research focused on the development of a "Highly Summarised Abstractive Summarization System".
- I found the "text degeneration problem" in seq2seq-based
summarization models, where the text generated by abstractive
summarization models is exactly the same as the extracted sentences from
articles. To develop a truly "Highly Summarised" summarization model
capable of summarizing, reorganizing, and compressing text, I
investigated this problem from three perspectives:
- model[1]: I introduced Determinantal Point Processes into Abstractive Summarization for the first time and improved the attention distribution in deep seq2seq models.
- data[2]: I discovered that public summarization datasets contain subjective bias from manual labeling, which leads to the text degeneration problem under supervised settings.
- task definition: I explored the influence of extract/generate proportion within the summarization task on the degeneration problem.
- Additionally, I engaged in various workshops addressing real-world
challenges in summarization, including Scientific Paper
Summarization[3], Multi-lingual and Low-Resource
Summarization[4] and Extreme Long Text
Summarization[5].
- Application Research Intern, Sina Weibo, 2019.6-2019.9
- I implemented BERT for blog classification in real-world production environments.
- I improved the two-tower models for keyword ranking.
- Application Researcher (including internship), Tencent AIPD,
2020.5-2022.5
- Mainly focused on strengthening NLU capabilities to enhance Tencent's news feed recommendations.
- I reformulated the Keyphrase Ranking task as a Machine Reading Comprehension task, which better considers the context and leverages pre-trained language models to improve performance.
- I developed a Keyphrase Prediction method based on a unified extract-and-generate architecture[6]. This approach combines extractive and generative tasks within a unified autoencoder and autoregressive language model, and it is the first to model the interaction between present and absent keyphrases.
- I explored Controllable Keyphrase Generation as an early attempt in prompt engineering[7]. This system employs specified topic keywords and automatically constructs prompts for LLMs to generate keyphrases, adhering to safety and customization requirements.
- Based on contrastive learning and information theory, I designed a
fine-tuning schema that minimizes hallucination in LLM generated
texts[8].
- Application Researcher, Tencent MLPD, 2022.5-present
- Concentrated on improving the accuracy of Tencent Advertisement Recommendation in the view of model-centric and data-centric.
- Model-Centric: I developed various models to mine the relationship between user interests in the advertising field and Non-advertising fields (such as reading blogs, playing games, and watching videos). These models include graph embeddings, sequence modeling, and diffusion models. Additionally, I explored interaction-enhanced two-tower models to directly enhance CTR/CVR prediction accuracy in Tencent Ads.
- Data-Centric: Focusing on data-centric approaches, I developed model-based (rather than heuristic) feature quantization techniques to enhance the stability and performance of features used in large-scale advertising recommendation systems. I am also exploring feature AutoML techniques to automate feature engineering for Tencent Ads.
Publications
- My
Google Scholar
- [1] In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes
- CoNLL 2019 Long Paper
- Lei Li*, Wei Liu*, Marina Litvak, Natalia Vanetik, Zuying Huang
- [code][paper]
- [2] Subjective Bias in Abstractive Summarization
- Arxiv Preprint
- Lei Li*, Wei Liu*, Marina Litvak, Natalia Vanetik, Jiacheng Pei, Yinan Liu, Siya Qi
- [code][paper]
- [3] CIST@CLSciSumm-19: Automatic Scientific Paper Summarization with Citances and Facets
- SIGIR 2019 workshop
- Lei Li, Yingqi Zhu, Yang Xie, Zuying Huang, Wei Liu, Xingyuan Li, Yinan Liu
- [paper]
- [4] Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus
- RANLP 2019 workshop
- Wei Liu, Lei Li, Zuying Huang, Yinan Liu
- [paper]
- [5] CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization
- EMNLP 2020 Workshop
- Lei Li, Yang Xie, Wei Liu, Yinan Liu, Yafei Jiang, Siya Qi, Xingyuan Li
- [paper]
- [6] UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction
- ACL 2021 Findings Long Paper
- Huanqin Wu*, Wei Liu*, Lei Li, Dan Nie, Tao Chen, Feng Zhang, Di Wang
- [code][paper]
- [7] Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning
- AAAI 2022 Long Paper
- Huanqin Wu, Baijiaxin Ma, Wei Liu, Tao Chen, Dan Nie
- [paper]
- [8] CO2Sum:Contrastive Learning for Factual-Consistent Abstractive Summarization
- Arxiv Preprint
- Wei Liu, Huanqin Wu, Wenjing Mu, Zhen Li, Tao Chen, Dan Nie
- [paper]
- [9] A Multi-View Abstractive Summarization Model Jointly Considering Semantics and Sentiment
- CCIS 2018 Long Paper
- Moye Chen, Lei Li, Wei Liu
- [paper]
- [10] In Submission
- Anonymous
- Anonymous
- Anonymous
- [11] In Submission
- Anonymous
- Anonymous
- Anonymous