Some very-personal questions, assumptions and predictions on the future after the large model era. I hope to keep it a habit for writing such future-ask post for every half year or one year to keep me thinking about the "next token" in the AI era. Still in draft.
Multi-agent Reinforcement Learning Notes
A simple note on the RL used in single-agent and multi-agent.
Debates between GPTs
- 基于ChatGPT-Shortcut改了一个网页,展示了一些GPT自己与自己发生的有趣辩论。
- 体验地址在这
- A webpage based on ChatGPT-Shortcut that shows some interesting debates that took place between GPTs.
- The experience website is here
Prompt - Task Reformulation in NLP
- 记录近年基于模板来完成任务重构的方法,这是一个比较有意思的方向,尤其是GPT3出现之后。
这类方法一般针对任务设计prompt,将样本和任务一起转换为自然语言形式的template,直接输入预训练语言模型预测出文本,间接的完成任务。prompt的构建一方面统一了下游任务和预训练任务的形式(语言模型)在few
shot learning上能取得较好结果。主要阅读以下9篇论文:
- 早期的将问题转为自然语言并使用预训练语言模型解答的:
- (Harvard)Commonsense Knowledge Mining from Pretrained Models
- (Heidelberg)Argumentative Relation Classification as Plausibility Ranking
- (NVIDIA)Zero-shot Text Classification With Generative Language Models
- PET方向,Pattern Exploiting Training
- (LMU)Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
- (LMU)It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
- (UNC)Improving and Simplifying Pattern Exploiting Training
- 自动构建prompt,Automatically Searching Prompts
- (UCI,UCB)AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts
- (Princeton, MIT)Making Pre-trained Language Models Better Few-shot Learners
- (THU)GPT Understands, Too
- 早期的将问题转为自然语言并使用预训练语言模型解答的:
Edit-based Text Generation
- 记录近年来关于编辑式seq2seq的方法,这类方法对于输入输出同语种且较小更改的任务(纠错、简化、摘要)有着高效率(部分自回归或非自回归解码)和less data hungry(输出词表小)的优势。
- 主要阅读五篇论文,按照其在arxiv上发表时间排序:
- (LevT, Facebook) Levenshtein Transformer
- (华为) EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
- (LaserTagger, Google) Encode, Tag, Realize: High-Precision Text Editing
- (PIE,印度理工) Parallel Iterative Edit Models for Local Sequence Transduction
- (Google) Felix: Flexible Text Editing Through Tagging and Insertion
Note for VC Dimension
简单的梳理VC维。所有讨论基于二分类这一简单情况出发。
Notes for NLP with Graph-Structured Representations
阅读来自University of Alberta 的Bang Liu博士的论文Natural Language Processing and Text Mining with Graph-Structured Representations,做一些笔记
Study Notes for CS224w
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs学习笔记,未完待续
CLSciSumm summary
总结一下实验室参加的CLSciSumm Workshop,主要是方法,实验分析论文里很详细 论文:
Incremental Decoding
记录一下Fairseq当中对于CNN seq2seq,Transformer之类的并行解码模型,在推理阶段的增量解码处理。
BERTology
翻译A Primer in BERTology: What we know about how BERT works一文,系统的介绍了近年来对于BERT可解释性以及扩展性方面的研究。 原论文arxiv pdf
Structured Neural Summarization, Paper Reading
STRUCTURED NEURAL SUMMARIZATION 阅读笔记
SVM
一些推导关键步骤
Reformer - Paper Reading
Reformer论文解读
Paper Reading 4
Note for Hierarchical Latent Dirichlet Allocation
记录 Hierarchical Latent Dirichlet Allocation,层次主题模型的学习笔记。 依然大量参考了徐亦达老师的教程。
Paper reading on Knowledge Graphs
知识图谱专辑
- 跨语言知识图谱中的实体对齐
- Knowledge Graph Language Model
- 动态知识图谱对话生成
- Graph2Seq
- Graph Matching Network
- 动态更新知识图谱
- Attention-based Embeddings for Relation Prediction
Note for Heterogeneous Information Network
记录近年来对于异构信息网络的一些处理
- PathSim
- HGNN
- HGAN
- HGAN for text classification
- 带属性,Attributed Multiplex Heterogeneous Network
- Meta-graph Guided Random Walks
- TBD
Note for Graph-based Summarization
基于图的自动摘要相关论文选读
AMR 生成式摘要
AMR 多文档摘要两篇
pagerank in encoder attention
基于主题建模构建图,使用ILP做抽取式摘要
基于GCN的多文档抽取式摘要
STRUCTURED NEURAL SUMMARIZATION
Easy Reinforcement Learning Notes
极简风格
- Q-learning
- Sarsa
- Sarsa(\(\lambda\))
- DQN
- Double DQN
- DQN with Prioritized Experience replay
- Dueling DQN
- Policy Gradient
Summarization-Related Papers Reading (ACL/NAACL 2019)
ACL/NAACL 2019 自动摘要相关论文选读
DPPs 相似度度量改进
STRASS:抽取式摘要的反向传播
先翻译再生成摘要
阅读理解+自动摘要
BiSET:Retrieve + Fast Rerank + Selective Encoding + Template Based
Study Notes for Cognitive Graph
今天阅读一篇来自清华和阿里巴巴团队的关于机器阅读理解方面的论文,Cognitive Graph for Multi-Hop Reading Comprehension at Scale。这篇论文同样中了ACL2019,但我没有将其放进ACL2019论文阅读的博文里,因为感觉这篇值得专门讲讲,虽然没有拿到优秀论文或者杰出论文,甚至提名都没有,但这篇论文的思路、方法论都非常好,用一种最简单的方式实现联结主义+知识推理。
Interview Summary for NLP
总结一下六月份的面试经验
Study Notes for Correlation Explaination
Outstanding Papers Reading (ACL 2019)
ACL 2019获奖论文选读。
- 利用oracle来做句子级别的teacher forcing
- speaker commitment
- 适用于摘要的一套评价指标框架,结合了多个指标
- Zero-Shot Entity Linking
Note for Variational Auto-Encoder
变分自编码器学习笔记
参考文章:
关于VAE,上面的原论文以及两篇博客已经讲的很清楚了,我写也就是复读转述,自己捋一遍,如果有人看到这篇博客,建议优先读这三个参考来源
Glove Embedding - Mathematical Derivation
- 记录一下Glove词向量的数学推导,因为原论文不是画模型得出的,而是纯数学操作计算得到的目标函数,这种设计方式非常有意思,而且还将word2vec的数学本质写出来进行了对比。
- 原论文:GloVe: Global Vectors for Word Representation
Paper Reading 3
卷积序列到序列
鲁棒的无监督跨语言词嵌入映射
Notes for Computational Linguistics
计算语言学课程笔记 参考教材:Speech and Language Processing:An Introduction to Natural Language Processing,Computational Linguistics, and Speech Recognition 一些公式待修订
Logistic Regression and Maximum Entropy
翻译John Mount的The equivalence of logistic regression and maximum entropy models 一文,并说明了这种证明是在统计学习方法中介绍最大熵模型的通用导出证明的一个特例
结论
- 最大熵模型就是softmax分类
- 在满足广义线性模型的平衡条件下,满足最大熵条件的模型映射函数就是softmax函数
- 在统计机器学习方法一书中,给出了在特征函数定义下的最大熵模型,其与softmax回归都属于对数线性模型
- 当特征函数从二值函数扩展为特征值本身时,最大熵模型就化为softmax回归模型
- 最大熵最大化的是条件熵,不是条件概率的熵,也不是联合概率的熵。