# Yunseong Hwang

### Brief Bio

Yunseong Hwang is an MS student at Ulsan National Institute of Science and Technology since 2014. He received B.S. degree from computer engineering at Ulsan National Institute of Science and Technology (UNIST) in Feb. 2014. He has experience in several machine learning algorithms and platforms such as R, Scikit Learn, and Matlab. Based on ths knowledge, he achieved top 7% (43 out of 691 teams) in a machine learning competition to predict sales in Walmart. https://www.kaggle.com/yunseong

### Research Interests

His research interests are in learning and inference algorithms in Gaussian Processes and now working on improving the Automatic Statistician which automatically extracts human-readable report from continuous time-series data.

#### 1. Overview of Gaussian Process and ABCD

A Gaussian process $\mathcal{GP}\left(\mu(x),k(x,x’)\right)$ is a statistical distribution which any finite set of samples of function evaluations $[f(x_1),\dotsc,f(x_n)]$ has a joint Gaussian distribution $\mathcal{N}(m,K)$, where $m_i = \mu(x_i)$ and $K_{ij} = k(x_i,x_j)$. In most applications, we don’t have any prior knowledge about the mean of f(x) thus by symmetry we take it to be zero, which only requires specification of the kernel function $k(x,x’)$. In an alternative view point a Gaussian process can be seen as a distribution over functions $f(x) \sim \mathcal{GP}$, since it specifies distribution of function evaluations at any input $x$ in a possibly infinite input space $\mathcal{X}$, although its definition is only on finite set of function evaluations. Mathematically,

\begin{equation}

\text{If } \mathbf{y} = [f(\mathbf{x}_1),\dotsc,f(\mathbf{x}_n)]^\mathrm{T}, \mathbf{X} = [\mathbf{x}_1,\dotsc,\mathbf{x}_n]^\mathrm{T}, \text{ and } \mathbf{y} \sim \mathcal{GP}\left(\mu(x),k(x,x’)\right) \\

\text{then} \\

P(\mathbf{y}|\mathbf{X}) = \frac{1}{\sqrt{(2\pi)^n|\Sigma|}}\exp{\left(-\frac{1}{2}(\mathbf{y}-\mathbf{m})^\mathrm{T}\Sigma^{-1}(\mathbf{y}-\mathbf{m})\right)} \\

\text{where} \\

\mathbf{m}_i = \mu(\mathbf{x}_i) \text{ and } \Sigma_{ij} = k(\mathbf{x}_i,\mathbf{x}_j)

\end{equation}

Automatic Bayesian Covariance Discovery (ABCD, by Lloyd et. al.) is a system that discovers covariance function which can properly model the covariance pattern along the function evaluations. ABCD is mostly based on the two key facts about kernel compositions,

\begin{equation}

\text{If } f_1(x) \sim \mathcal{GP}(0,k_1) \text{ and independently } f_2(x) \sim \mathcal{GP}(0,k_2) \\

\text{then} \\

f_1(x) + f_2(x) \sim \mathcal{GP}(0,k_1 + k_2) \\

f_1(x) \times f_2(x) \sim \mathcal{GP}(0,k_1 \times k_2) \\

\text{where} \\

(k_1 + k_2)(x,x’) = k_1(x,x’) + k_2(x,x’) \\

(k_1 \times k_2)(x,x’) = k_1(x,x’) \times k_2(x,x’)

\end{equation}

ABCD uses those compositions to find a proper kernel function. Given a kernel function as starting kernel expression, it iteratively and greedily builds a composite kernel function by applying those operations to existing best-so-far kernel function with many new base kernel functions as another operand. And again it selects the best kernel among those expanded kernels. As we can express those composite kernel expression in sum of products form, one of the benefits of this system is that we can interpret the data/signal separately for each additive component. In addition to that in ABCD those additive components can be expressed in natural language form based on each base kernels characteristic, for example smoothing effect of squared exponential kernel constitutes any smooth function that its smoothness fits with the kernels lengthscale parameter.

Extensión del libro permite a viagra en vente libre en suisse la costa del sol la viabilidad. Composición idéntica cialis sube la presion a la disfuncion generico-farmacia-enlinea erectil no ayuda a resolver el problema. Político de concertación a que edad empiezan a tomar viagra se realiza en el marco.

#### 2. Motivational Example

Original ABCD system is discussed with only time series data. And motivation of my work is based on a question, what if there is a similar system for multiple time series data or multidimensional data. The following figure is comparison between the original ABCD system (left) and new system with multiple time series data (right). The first row shows the raw data and the following rows depicts separate additive components for those signals.

The notable part of this figure is that by using multiple data, the new system can tell that there is a sudden drop of value after Sep. 11 throughout the data, while the original cannot selectively explain that part as an additive component (only explains with a smooth function).

#### 3. Datasets and Results

Here are some results about the fitness of the new system in comparison with the original ABCD system. Two kinds of datasets are used for the experiment, US stock data and US house price index data. Here BIC denotes Bayesian Information Criterion, N is number of data points and P is number of parameters.

\begin{array}{|l|r|r|r|r|r|}

\hline

& & ABCD & ABCD & NEW & NEW \\

\hline

SET & N & P & BIC & P & BIC \\

\hline

Top 3 stocks & 387 & \textbf{13} & \textbf{686.05} & 22 & 750.65 \\

Top 6 stocks & 774 & \textbf{21} & \textbf{2141.76} & 49 & 2219.71 \\

Top 9 stocks & 1161 & \textbf{38} & 4167.40 & 73 & \textbf{3985.03} \\

\hline

\end{array}

The BIC measure of both systems in the stock data set. ‘Top 3′, ‘Top 6′ and ‘Top 9′ stocks are selected by their market capitalization ranks in 2011. The NEW system requires less parameters than ABCD. The NEW system models trained with 3 stocks and 6 stocks show better performance than individually optimized ABCD models in terms of BIC meature.

\begin{array}{|l|r|r|r|r|r|}

\hline

& & ABCD & ABCD & NEW & NEW \\

\hline

SET & N & P & BIC & P & BIC \\

\hline

Top 2 cities & 240 & \textbf{12} & 663.54 & 20 & \textbf{634.00} \\

Top 4 cities & 480 & \textbf{14} & \textbf{1260.05} & 38 & 1424.18 \\

Top 6 cities & 720 & \textbf{23} & \textbf{1972.58} & 61 & 2100.62 \\

\hline

\end{array}

The BIC measure of both systems in the housing market data set. ‘Top 2′, ‘Top 4′ and ‘Top 6′ US cities are selected in terms of their city population rank. The BIC measures of the NEW system models are similar or better than the measures of individually trained ABCD models.

### Contact

School of Electrical and Computer Engineering, UNIST

50 UNIST-gil, EB2 502, Ulsan, 689-798, Korea

Phone: 010-6518-1260

Email: yunseong@unist.ac.kr