[Some Questions asking Myself 2025.5]
The second post on my "some very-personal questions to myself" series. It's been over a year since last post and many progress on LLM have been made from academic/industry, which partially solves my questions. I will introduce these works and ask myself some new questions. This post is about Pretrain Ceiling, Second Half, Scaling the Environment.
[Some Questions asking Myself 2024.4]
Some very-personal questions, assumptions and predictions on the future after the large model era. I hope to keep it a habit for writing such future-ask post for every half year to keep me thinking about the "next token" in the AI era. This post is about Compression, World Model, Agent and Alignment.
Debates between GPTs
- A webpage based on ChatGPT-Shortcut that shows some interesting debates that took place between GPTs.
- The experience website is here
Prompt - Task Reformulation in NLP
- Record of recent task reconstruction methods based on templates, a
particularly interesting direction since the appearance of GPT-3. These
methods typically design prompts for tasks, converting samples and tasks
into natural language templates, which are then directly input into
pre-trained language models to generate text, thereby indirectly
completing the tasks. The construction of prompts standardizes the form
of downstream tasks and pre-trained tasks (language models), achieving
good results in few-shot learning. Key papers to read include the
following nine:
- Early work that converts questions into natural language and uses
pre-trained language models for answers:
- (Harvard) Commonsense Knowledge Mining from Pretrained Models
- (Heidelberg) Argumentative Relation Classification as Plausibility Ranking
- (NVIDIA) Zero-shot Text Classification With Generative Language Models
- The PET approach, Pattern Exploiting Training:
- (LMU) Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
- (LMU) It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
- (UNC) Improving and Simplifying Pattern Exploiting Training
- Automatically constructing prompts, Automatically Searching Prompts:
- (UCI, UCB) AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts
- (Princeton, MIT) Making Pre-trained Language Models Better Few-shot Learners
- (THU) GPT Understands, Too
- Early work that converts questions into natural language and uses
pre-trained language models for answers:
Edit-based Text Generation
- Record the methods of editing seq2seq in recent years, which have the advantages of high efficiency (partially autoregressive or non-autoregressive decoding) and less data hungry (small output vocabulary) for tasks with the same language input and output and minor changes (error correction, simplification, summarization).
- Mainly read five papers, sorted by their publication date on arXiv:
- (LevT, Facebook) Levenshtein Transformer
- (Huawei) EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
- (LaserTagger, Google) Encode, Tag, Realize: High-Precision Text Editing
- (PIE) Parallel Iterative Edit Models for Local Sequence Transduction
- (Google) Felix: Flexible Text Editing Through Tagging and Insertion
Note for VC Dimension
A brief review of the VC dimension. All discussions are based on the simple case of binary classification.
Notes for NLP with Graph-Structured Representations
Read Dr. Bang Liu’s paper Natural Language Processing and Text Mining with Graph-Structured Representations from the University of Alberta and take some notes.
CLSciSumm summary
A brief note on the CLSciSumm Workshop that the CIST lab participated in, the main focus is on methods. The experiments are analysised in detail in papers. Papers:
Incremental Decoding
Record the incremental decoding processing of parallel decoding models such as CNN seq2seq and Transformer in the inference phase in Fairseq.
Note for Hierarchical Latent Dirichlet Allocation
Note for Hierarchical Latent Dirichlet Allocation
Paper reading on Knowledge Graphs
- Knowledge Graph Special Collection
- Entity Alignment in Cross-lingual Knowledge Graphs
- Knowledge Graph Language Model
- Dynamic Knowledge Graph Dialogue Generation
- Graph2Seq
- Graph Matching Network
- Dynamic Knowledge Graph Update
- Attention-based Embeddings for Relation Prediction
Note for Heterogeneous Information Network
Record some recent processing of heterogeneous information networks
- PathSim
- HGNN
- HGAN
- HGAN for text classification
- Attribute, Attributed Multiplex Heterogeneous Network
- Meta-graph Guided Random Walks
Note for Graph-based Summarization
Graph-based Automatic Summary Related Paper Selection Reading
- AMR Generative Summary
- AMR Multi-document Summarization Two Papers
- pagerank in encoder attention
- Build a graph based on thematic modeling, use ILP for extractive summarization
- Multi-document Extractive Summary Based on GCN
- STRUCTURED NEURAL SUMMARIZATION
Easy Reinforcement Learning Notes
rl study note, minimalist style
- Q-learning
- Sarsa
- Sarsa(\(\lambda\))
- DQN
- Double DQN
- DQN with Prioritized Experience replay
- Dueling DQN
- Policy Gradient
Summarization-Related Papers Reading (ACL/NAACL 2019)
Selected Reading of ACL/NAACL 2019 Automatic Summarization Papers
DPPs Similarity Measurement Improvement
STRASS: Backpropagation for Extractive Summarization
Translate first, then generate the abstract
Reading Comprehension + Automatic Abstract
BiSET: Retrieve + Fast Rerank + Selective Encoding + Template Based
Study Notes for Correlation Explaination
Note for CorEx(Correlation Explaination).
Outstanding Papers Reading (ACL 2019)
Selected readings from ACL 2019 award-winning papers.
- Using Oracle for sentence-level teacher forcing
- speaker commitment
- A set of evaluation index frameworks applicable to abstracts, combining multiple indicators
- Zero-Shot Entity Linking
Note for Variational Auto-Encoder
Variational Autoencoder Learning Notes
Reference Article:
On VAE, the original paper and the two blogs above have already explained it very clearly. I am just repeating and paraphrasing, just to go through it myself. If anyone reads this blog, I recommend reading these three reference sources first
Glove Embedding - Mathematical Derivation
- Record the mathematical derivation of GloVe word vectors, as the original paper does not derive the model graphically but rather calculates the objective function through pure mathematical operations. This design approach is very interesting, and it also writes out and compares the mathematical essence of word2vec.
- GloVe: Global Vectors for Word Representation
Paper Reading 3
Convolutional Sequence to Sequence
Robust Unsupervised Cross-Lingual Word Embedding Mapping