Thinkwee's Blog

Too Stupid to Give Up Learning

The second post on my "some very-personal questions to myself" series. It's been over a year since last post and many progress on LLM have been made from academic/industry, which partially solves my questions. I will introduce these works and ask myself some new questions. see last post here[previous_post]. This post is about Pretrain Ceiling, Second Half, Scaling the Environment.

Read more »

Some very-personal questions, assumptions and predictions on the future after the large model era. I hope to keep it a habit for writing such future-ask post for every half year to keep me thinking about the "next token" in the AI era. This post is about Compression, World Model, Agent and Alignment.

Read more »


  • Record of recent task reconstruction methods based on templates, a particularly interesting direction since the appearance of GPT-3. These methods typically design prompts for tasks, converting samples and tasks into natural language templates, which are then directly input into pre-trained language models to generate text, thereby indirectly completing the tasks. The construction of prompts standardizes the form of downstream tasks and pre-trained tasks (language models), achieving good results in few-shot learning. Key papers to read include the following nine:
    • Early work that converts questions into natural language and uses pre-trained language models for answers:
      • (Harvard) Commonsense Knowledge Mining from Pretrained Models
      • (Heidelberg) Argumentative Relation Classification as Plausibility Ranking
      • (NVIDIA) Zero-shot Text Classification With Generative Language Models
    • The PET approach, Pattern Exploiting Training:
      • (LMU) Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
      • (LMU) It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
      • (UNC) Improving and Simplifying Pattern Exploiting Training
    • Automatically constructing prompts, Automatically Searching Prompts:
      • (UCI, UCB) AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts
      • (Princeton, MIT) Making Pre-trained Language Models Better Few-shot Learners
      • (THU) GPT Understands, Too
        Read more »


  • Record the methods of editing seq2seq in recent years, which have the advantages of high efficiency (partially autoregressive or non-autoregressive decoding) and less data hungry (small output vocabulary) for tasks with the same language input and output and minor changes (error correction, simplification, summarization).
  • Mainly read five papers, sorted by their publication date on arXiv:
    • (LevT, Facebook) Levenshtein Transformer
    • (Huawei) EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
    • (LaserTagger, Google) Encode, Tag, Realize: High-Precision Text Editing
    • (PIE) Parallel Iterative Edit Models for Local Sequence Transduction
    • (Google) Felix: Flexible Text Editing Through Tagging and Insertion
Read more »

A brief review of the VC dimension. All discussions are based on the simple case of binary classification.

Read more »

Read Dr. Bang Liu’s paper Natural Language Processing and Text Mining with Graph-Structured Representations from the University of Alberta and take some notes.

Read more »

Study notes for Stanford CS224W: Machine Learning with Graphs by Jure Leskovec.

Read more »

A brief note on the CLSciSumm Workshop that the CIST lab participated in, the main focus is on methods. The experiments are analysised in detail in papers. Papers:

Read more »

Record the incremental decoding processing of parallel decoding models such as CNN seq2seq and Transformer in the inference phase in Fairseq.

Read more »

Long time no see, SVM.

Read more »

Paper reading on

  • GNN Pooling
  • Discourse-Aware Summarization
  • Siamese BERT
  • Large Chatbot
Read more »

  • Knowledge Graph Special Collection
    • Entity Alignment in Cross-lingual Knowledge Graphs
    • Knowledge Graph Language Model
    • Dynamic Knowledge Graph Dialogue Generation
    • Graph2Seq
    • Graph Matching Network
    • Dynamic Knowledge Graph Update
    • Attention-based Embeddings for Relation Prediction
Read more »

Record some recent processing of heterogeneous information networks

  • PathSim
  • HGNN
  • HGAN
  • HGAN for text classification
  • Attribute, Attributed Multiplex Heterogeneous Network
  • Meta-graph Guided Random Walks
Read more »

Graph-based Automatic Summary Related Paper Selection Reading

  • AMR Generative Summary
  • AMR Multi-document Summarization Two Papers
  • pagerank in encoder attention
  • Build a graph based on thematic modeling, use ILP for extractive summarization
  • Multi-document Extractive Summary Based on GCN
  • STRUCTURED NEURAL SUMMARIZATION
Read more »

rl study note, minimalist style

  • Q-learning
  • Sarsa
  • Sarsa(\(\lambda\))
  • DQN
  • Double DQN
  • DQN with Prioritized Experience replay
  • Dueling DQN
  • Policy Gradient
Read more »

Selected Reading of ACL/NAACL 2019 Automatic Summarization Papers

  • DPPs Similarity Measurement Improvement

  • STRASS: Backpropagation for Extractive Summarization

  • Translate first, then generate the abstract

  • Reading Comprehension + Automatic Abstract

  • BiSET: Retrieve + Fast Rerank + Selective Encoding + Template Based

Read more »

Selected readings from ACL 2019 award-winning papers.

  • Using Oracle for sentence-level teacher forcing
  • speaker commitment
  • A set of evaluation index frameworks applicable to abstracts, combining multiple indicators
  • Zero-Shot Entity Linking
Read more »


Read more »


  • Record the mathematical derivation of GloVe word vectors, as the original paper does not derive the model graphically but rather calculates the objective function through pure mathematical operations. This design approach is very interesting, and it also writes out and compares the mathematical essence of word2vec.
  • GloVe: Global Vectors for Word Representation
Read more »

  • Convolutional Sequence to Sequence

  • Robust Unsupervised Cross-Lingual Word Embedding Mapping

Read more »

Course Notes on Computational Linguistics, Reference Textbook: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.

Read more »
0%