Logistic Regression and Maximum Entropy

Posted on 2018-10-14 Edited on 2025-01-30 In ML Views: Word count in article: 12k Reading time ≈ 11 mins.

Note for John Mount's "The Equivalence of Logistic Regression and Maximum Entropy Models" and explains that this proof is a special case of the general derivation proof of the maximum entropy model introduced in statistical learning methods

Conclusion

Maximum entropy model is softmax classification
Under the balanced conditions of the general linear model, the model mapping function that satisfies the maximum entropy condition is the softmax function
In the book on Statistical Machine Learning methods, a maximum entropy model defined under the feature function is presented, which, along with softmax regression, belongs to the class of log-linear models
When the feature function extends from a binary function to the feature value itself, the maximum entropy model becomes a softmax regression model
The maximum entropy maximizes conditional entropy, not the entropy of conditional probabilities, nor the entropy of joint probabilities.

Future of Computing Salon - Reading Comprehension Session

Posted on 2018-10-13 Edited on 2025-01-30 In NLP Views: Word count in article: 8.4k Reading time ≈ 8 mins.

Attended a light salon at Tsinghua University's FIT, which introduced some advancements in machine reading comprehension. Interestingly, the PhD who spoke at 9 am also mentioned an unpublished work: BERT, which is very impressive and well-funded; it took eight p100 GPUs to train for a year. By 10:30, Machine Intelligence had already published a report, and by the afternoon, Zhihu was buzzing with discussions, saying that a new era for NLP had arrived... This salon is part of a series, and there may be future sessions on machine translation, deep Bayesian, transfer learning, and knowledge graphs, so if you have the time, you might as well listen and take notes.

DeepBayes 2018

Posted on 2018-09-22 Edited on 2025-01-30 In Math Views: Word count in article: 12k Reading time ≈ 11 mins.

Deep-Bayes 2018 Summer Camp的习题填不动了，就到这吧

Note for Inference Algorithms in Probabilistic ML

Posted on 2018-08-28 Edited on 2025-01-30 In ML Views: Word count in article: 50k Reading time ≈ 46 mins.

Record the principles and derivations of algorithms used for inferring unknown variables in probabilistic machine learning, such as Variational Inference, Expectation Maximization, and Markov Chain Monte Carlo. Many contents and derivations, as well as images, come from the online course and lecture notes of Professor Xu Yida at the University of Technology Sydney. Professor Xu's series of videos on non-parametric Bayesian methods are very good, and you can find the videos by searching his name on Bilibili or Youku. The address of Professor Xu's course notes is roboticcam/machine-learning-notes. Unless otherwise specified, some screenshots and code are from Professor Xu's lecture notes. Other contents come from various books or tutorials, and the references will be indicated in the text.

Statistical Learning - A hand-write note

Posted on 2018-08-09 Edited on 2025-01-30 In ML Views: Word count in article: 2.5k Reading time ≈ 2 mins.

The ten major algorithms of statistical learning methods have been simplified and handwritten out (although I think the book itself is already quite concise). Now there is only the process of the algorithms themselves; in the future, if I have any new understandings, I will supplement them. The writing is ugly, even I can't bear to look at it, so I post it purely as a backup

Note for Latent Dirichlet Allocation

Posted on 2018-07-23 Edited on 2025-01-30 In ML Views: Word count in article: 77k Reading time ≈ 1:10

Latent Dirichlet Allocation Document Topic Generation Model Study Notes This article mainly summarizes from "Mathematical Curiosities of LDA(LDA数学八卦)," which is written very beautifully (recommend reading the original first). There are many places that spark further thought, and this article sorts out the steps to derive LDA, removes some irrelevant extensions, and summarizes LDA in plain language.

Seq2seq based Summarization

Posted on 2018-07-04 Edited on 2025-01-30 In NLP Views: Word count in article: 19k Reading time ≈ 18 mins.

A bachelor's graduation project involves developing a short sentence summarization model based on seq2seq and designing an emotional fusion mechanism. Now, let's provide a brief summary of the entire model

Paper Reading 2

Posted on 2018-07-03 Edited on 2025-01-30 In ML Views: Word count in article: 20k Reading time ≈ 18 mins.

Distractor Mechanism
External Information Attention
Pointer Copy Network PGNet
Extractive Summary Based on RNN
Transformer
Selection gate mechanism

Paper Reading 1

Posted on 2018-03-07 Edited on 2025-01-30 In ML Views: Word count in article: 35k Reading time ≈ 31 mins.

Opening Work on Attention (Machine Translation)
Luong attention, global and local attention,
Opening Work on Attention (Automatic Text Summarization)
Generative Summary Techniques Collection: LVT, Switching Networks, Hierarchical Attention
Dialogue System, End-to-End Hierarchical RNN
Weibo summary, supplement micropoints
disan, directed transformer, attention mask
Attention Extractor
Generative Summary Based on Reinforcement Learning
w2v, negative sampling

NLP Basics

Posted on 2018-03-07 Edited on 2025-01-30 In NLP Views: Word count in article: 15k Reading time ≈ 13 mins.

Recorded some basic knowledge of deep learning learned when recording the seq2seq model in the entry-level NLP.

OJ

Posted on 2017-03-27 Edited on 2025-01-30 In Algo Views: Word count in article: 21k Reading time ≈ 19 mins.

算法刷题目录，方便自己查找回忆复习之后(2018.9.27)只更新leetcode上的题了，也懒得整理源码了，leetcode上都存了，只记录思路吧

Lagrange,KKT,PCA,SVM

Posted on 2017-03-18 Edited on 2025-01-30 In ML Views: Word count in article: 11k Reading time ≈ 10 mins.

Introduction of the Lagrange multiplier method and its extension KKT conditions, as well as their applications in PCA and SVM

K-Means and KNN

Posted on 2017-03-16 Edited on 2025-01-30 In ML Views: Word count in article: 12k Reading time ≈ 11 mins.

以简单的Iris数据集做测试，实现了K-means++聚类算法，并与sklearn中自带的KNN算法进行比较
标题本来是K-Means&KNN，把&改成了和，因为标题中出现特殊符号&会导致我的sitemap生成错误......

Notes for my Android app - Melodia

Posted on 2017-03-09 Edited on 2025-01-30 In Android Views: Word count in article: 26k Reading time ≈ 24 mins.

The school's innovation project has a simple app that implements the following functions: recording sound and saving it as a wav file, using JSON to communicate with the server, uploading the wav file to the server, converting it to a midi file on the server, downloading the midi file and sheet music from the server for playback. At the same time, the modified electronic piano can also communicate with the server, with the phone providing auxiliary parameters to the electronic piano, which reads the intermediate key value file of the music from the server via Arduino to play.

Notes for ML

Posted on 2017-02-12 Edited on 2025-01-30 In ML Views: Word count in article: 28k Reading time ≈ 26 mins.

Notes on some concepts and algorithms in machine learning, sourced from:

Elective Course on Pattern Recognition (An elective course for third-year students at Beijing University of Posts and Telecommunications, Pattern Recognition, textbook is "Pattern Recognition" compiled by Zhang Xuegong, published by Tsinghua University Press)
Watermelon Book
Statistical Learning Methods
Deep Learning (Translated in Chinese: exacity/deeplearningbook-chinese)

Update:

2017-02-12 Overview Update
2017-03-01 Update k-Nearest Neighbors
2017-03-08 Update SVM
2018-01-04 Update of fundamental knowledge of machine learning and mathematical knowledge in the book "Deep Learning"
2018-08-09 The content of Statistical Learning Methods has been posted in another article titled "Handwritten Notes on Statistical Learning Methods," and it is estimated that it will not be updated anymore. Later, some remaining contents in "Deep Learning" may be updated

ML Basic Practices

Posted on 2017-02-07 Edited on 2025-01-30 In ML Views: Word count in article: 23k Reading time ≈ 21 mins.

Introduction

In November 2016, the decision was made to start delving into machine learning. Initially, I followed the official example on the Kaggle platform for the first task, "Titanic Survivor Analysis."

2017 February Update: Data was reorganized using pandas, detailed accuracy was calculated, and Linear Regression from scikit-learn was tested

Title Introduction is here: Titanic: Machine Learning from Disaster

Below is the dataset table style, each person has 12 attributes

Pandas Basics

Posted on 2017-02-04 Edited on 2025-01-30 In Python Views: Word count in article: 7.4k Reading time ≈ 7 mins.

Using the data from the Titanic as an example, introduce the basic operations performed on the data in the early stages.

Note for Linear Algebra 3

Posted on 2017-01-22 Edited on 2025-01-30 In Math Views: Word count in article: 32k Reading time ≈ 29 mins.

Lecture 17: Determinants and Their Properties

Determinant

The determinant of matrix A is a number associated with the matrix, denoted as \(detA或者|A|\)
Properties of determinants
- \(detI=1\)
- The sign of the determinant value will be reversed when rows are exchanged
- The determinant of a permutation matrix is 1 or -1, depending on the parity of the number of rows exchanged
- Two rows being equal makes the determinant equal to 0 (which can be directly deduced from property two)
- Matrix elimination does not change its determinant (proof is below)
- A certain row is 0, the determinant is 0 (multiplying by 0 is equivalent to a certain row being 0, resulting in 0)
- When and only when A is a singular matrix
- \(det(A+B) \neq detA+detB \\ detAB=(detA)(detB)\)
- \(detA^{-1}detA=1\)
- \(detA^2=(detA)^2\)
- \(det2A=2^n detA\)
- \(detA^T=detA\) (Proof see below)

Note for Linear Algebra 2

Posted on 2017-01-21 Edited on 2025-01-30 In Math Views: Word count in article: 26k Reading time ≈ 24 mins.

Lecture 9: Linear Correlation, Basis, Dimension

Linear Correlation

Background knowledge: Assume a matrix A, where m < n, i.e., the number of unknowns is greater than the number of equations. Therefore, in the null space, there are vectors other than the zero vector, up to m leading principal elements, and there exist n-m free vectors, and the entire equation system has non-zero solutions.
Under what conditions is the vector \(x_1,x_2,x_3...x_n\) linearly independent? If there exists a combination of coefficients not all equal to zero such that the linear sum results in 0, then it is linearly dependent; otherwise, it is linearly independent.
If there exists a zero vector in the set of vectors, then the set of vectors cannot be linearly independent.
If three vectors are randomly drawn in two-dimensional space, they must be linearly dependent. Why? This can be deduced from background knowledge.
For a matrix A, we are concerned with whether the columns are linearly dependent; if there exists a non-zero vector in the null space, then the columns are dependent.
When \(v_1,v_2...v_n\) is the columns of A, if they are unrelated, then what is the null space of A? Only the zero vector. If they are related, then in addition to the zero vector, there exists a non-zero vector in the null space.
When the column vectors are linearly independent, all column vectors are leading vectors, and the rank is n. When the column vectors are linearly dependent, the rank is less than n.

Note for Linear Algebra 1

Posted on 2017-01-21 Edited on 2025-01-30 In Math Views: Word count in article: 28k Reading time ≈ 26 mins.

First Lecture: Geometric Interpretation of Systems of Equations

From three perspectives to view the system of equations: row graph, column graph, matrix

note for building a blog

Posted on 2017-01-16 Edited on 2025-01-30 In Other Views: Word count in article: 16k Reading time ≈ 15 mins.

Always wanted to build my own blog, previously thought of using WordPress, but suffered from laziness, didn't want to mess with the server. Later, I stumbled upon GitHub Pages, which automatically generates a website by uploading a js project, and everything is hosted on GitHub. The official instructions also recommend using this for writing blogs, so I started to try it. The general framework should be that GitHub Pages generates the website from your github.io project on GitHub, Hexo generates the static web page project from your blog content and custom settings, and then uploads it to your repository. To back up, we will establish two branches in the repository: one master for Hexo to upload static web page files, and one hexo for saving the local Hexo project. Below, I share some experiences and pitfalls encountered.

```
  2017.2.8 update md writing software
```

  2017.2.10 update mathjax cdn, add long gallery, update domain name, access分流(blog2.0)

  2017.2.13 update optimization plugin, update top description, optimize long gallery, widen article width(blog3.0)

  2017.3.30 update top description original address

```
  2017.12.27 update异地恢复
```

  2018.7.6 update a more comprehensive reference website

A summary of my Android apps:BuptRoom

Posted on 2017-01-16 Edited on 2025-01-30 In Android Views: Word count in article: 21k Reading time ≈ 19 mins.

Introduction

Write an app to query the school's empty classrooms Pull information from the school's registration website, classify and display it, and add some miscellaneous things After all, it's my first time writing Android, so I want to try everything Download here: BuptRoom repository address: A simple Beiyou self-study room query system It took about 3 weekends to complete the first version, and then I spent about 1 month updating miscellaneous things After that, I spent about 1 month updating miscellaneous things Many things written in an unstandardized manner, and I just looked up and used them temporarily Summarize the experience of writing the App: