An LP-based hyperparameter optimization model for language modeling

Amir Hossein Akhavan Rahnama; Mehdi Toloo; Nezer Jacob Zaidenberg

doi:10.1007/s11227-018-2236-6

An LP-based hyperparameter optimization model for language modeling

Amir Hossein Akhavan Rahnama, Mehdi Toloo^*, Nezer Jacob Zaidenberg

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

Original language	English
Pages (from-to)	2151-2160
Number of pages	10
Journal	Journal of Supercomputing
Volume	74
Issue number	5
DOIs	https://doi.org/10.1007/s11227-018-2236-6
Publication status	Published - May 1 2018
Externally published	Yes

Keywords

Hyperparameter optimization
Language model
Linear programming
Machine learning
Optimization
n-Grams

ASJC Scopus subject areas

Software
Theoretical Computer Science
Information Systems
Hardware and Architecture

Access to Document

10.1007/s11227-018-2236-6

Cite this

@article{fbfd36bb94364121815cc773d20f8448,

title = "An LP-based hyperparameter optimization model for language modeling",

abstract = "In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models{\textquoteright} hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.",

keywords = "Hyperparameter optimization, Language model, Linear programming, Machine learning, Optimization, n-Grams",

author = "Rahnama, {Amir Hossein Akhavan} and Mehdi Toloo and Zaidenberg, {Nezer Jacob}",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2018",

month = may,

day = "1",

doi = "10.1007/s11227-018-2236-6",

language = "English",

volume = "74",

pages = "2151--2160",

journal = "Journal of Supercomputing",

issn = "0920-8542",

publisher = "Springer Netherlands",

number = "5",

}

TY - JOUR

T1 - An LP-based hyperparameter optimization model for language modeling

AU - Rahnama, Amir Hossein Akhavan

AU - Toloo, Mehdi

AU - Zaidenberg, Nezer Jacob

PY - 2018/5/1

Y1 - 2018/5/1

N2 - In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

AB - In order to find hyperparameters for a machine learning model, algorithms such as grid search or random search are used over the space of possible values of the models’ hyperparameters. These search algorithms opt the solution that minimizes a specific cost function. In language models, perplexity is one of the most popular cost functions. In this study, we propose a fractional nonlinear programming model that finds the optimal perplexity value. The special structure of the model allows us to approximate it by a linear programming model that can be solved using the well-known simplex algorithm. To the best of our knowledge, this is the first attempt to use optimization techniques to find perplexity values in the language modeling literature. We apply our model to find hyperparameters of a language model and compare it to the grid search algorithm. Furthermore, we illustrate that it results in lower perplexity values. We perform this experiment on a real-world dataset from SwiftKey to validate our proposed approach.

KW - Hyperparameter optimization

KW - Language model

KW - Linear programming

KW - Machine learning

KW - Optimization

KW - n-Grams

UR - http://www.scopus.com/inward/record.url?scp=85040232951&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040232951&partnerID=8YFLogxK

U2 - 10.1007/s11227-018-2236-6

DO - 10.1007/s11227-018-2236-6

M3 - Article

AN - SCOPUS:85040232951

SN - 0920-8542

VL - 74

SP - 2151

EP - 2160

JO - Journal of Supercomputing

JF - Journal of Supercomputing

IS - 5

ER -

An LP-based hyperparameter optimization model for language modeling

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this