%\VignetteEngine{R.rsp::tex}
%\VignetteIndexEntry{MdsOPT package, published in M. Walesiak and A. Dudek, "Searching for an Optimal MDS Procedure for Metric and Interval-Valued Data using mdsOpt R package", in: Education Excellence and Innovation Management: A 2025 Vision to Sustain Economic Development during Global Challenges, 2020, pp. 307-324.}
\documentclass[article, nojss]{jss}
\usepackage{underscore} 
\usepackage[ascii]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amssymb,amsfonts,textcomp}
\usepackage{array}
\usepackage{supertabular}
\usepackage{hhline}
\usepackage{booktabs}
\usepackage{multirow}
\usepackage{graphicx}
\makeatletter
\newcommand\arraybslash{\let\\\@arraycr}
\makeatother
\setlength\tabcolsep{1mm}
\renewcommand\arraystretch{1.3}
\newcommand\normalsubformula[1]{\text{\mathversion{normal}$#1$}}
\newcounter{saveenum}


\author{Marek Walesiak  \\Wroc{\l}aw University of Economics  \And 
Andrzej Dudek \\Wroc{\l}aw University of Economics}
\title{mdsOpt -- Searching for Optimal MDS Procedure \newline for Metric and Interval-Valued Data}


\Plainauthor{Marek Walesiak, Andrzej Dudek} %% comma-separated
\Plaintitle{mdsOpt -- Searching for Optimal  MDS Procedure for Metric and Interval-Valued Datat} %% without formatting
\Shorttitle{mdsOpt -- Searching for Optimal MDS Procedure } %% a short title (if necessary)

\Abstract{
In multidimensional scaling (MDS) carried out on the basis of a metric data matrix (interval, ratio)
or interval-valued data table three approaches can be distinguished: classic-to-classic -- for metric data,
symbolic-to-classic and symbolic-to-symbolic -- for interval-valued data. The article
presents the \pkg{mdsOpt} package which helps to solve the problem of choosing the
optimal multidimensional scaling procedure. It uses two criteria for selecting the optimal multidimensional scaling
procedure: Kruskal's  $\mathit{Stress}\text{{}-}1$ fit measure ($I\text{{}-}\mathit{Stress}$ in case of
symbolic-to-symbolic approach) and Hirschman-Herfindahl  $\mathit{HHI}$ index calculated based on Stress per point
(Interval stress per box in case of symbolic-to-symbolic approach) values. In first part three possible approaches are
characterised with theoretical background of used methods and the relationships between \pkg{mdsOpt} package and existing R
packages. Second part explains procedure and criteria for selection of the optimal multidimensional scaling procedure
for metric and interval-valued data. The last contains in details the usage of package and examples (R scripts) on real
data sets related to tourist attractiveness of Polish voivodships (provinces) and Lower-Silesian counties.}

\Address{ 
Marek Walesiak\\
Wroclaw University of Economics\\
Department of Econometrics and Computer Science\\
Komandorska 118/120\\
53-345 Wroc\l{}aw, Poland\\
E-mail: \email{marek.walesiak@ue.wroc.pl}\\


Andrzej Dudek\\
Wroclaw University of Economics\\
Department of Econometrics and Computer Science\\
Komandorska 118/120\\
53-345 Wroc\l{}aw, Poland\\
E-mail: \email{andrzej.dudek@ue.wroc.pl}
}

\Keywords{ multidimensional scaling, metric and interval-valued data, tourist attractiveness, $HHI$ index, R}
\Plainkeywords{ multidimensional scaling, metric and interval-valued data, tourist attractiveness, HHI index, R}
\begin{document}
\section[Introduction]{Introduction}
\subsection[The aim of multidimensional scaling]{The aim of multidimensional scaling}

Classical multidimensional scaling (MDS) is a method that represents (dis)similarity data as distances in a low-dimensional space (typically 2 or 3 dimensional) 
in order to make these data accessible to visual inspection and exploration (\cite{Borg+Groenen:2005}, p. 3). 
Classical MDS requires that each entry of dissimilarity matrix be a single numerical value.
Dissimilarity between object \textit{i} and object \textit{k} can be fuzzy (\cite{Groenen+Winsberg+Rodriguez+Diday:2006}, p. 361). The fuzzy dissimilarity 
is represented by an interval and $\mathit{n\times n}$ dissimilarity matrix is an interval of values $\left[\delta _{\mathit{ik}}^l,\delta_{\mathit{ik}}^u\right]$, where 
$\delta_{\mathit{ik}}^l(\delta _{\mathit{ik}}^u)$ denotes {the lower (upper) bound of the dissimilarity of objects }\textit{{i
}}{and }\textit{{k }}in \textit{m}{}-dimensional space.
Multidimensional scaling of interval dissimilarities represents the lower and upper bounds of the dissimilarities as distances between hypercubes (rectangles in two-dimensional space 
and cubes in three-dimensional space). 
The dimensions are not directly observable. They have the nature of latent variables. MDS allows the similarities and differences between the analyzed objects to be explained.


Multidimensional scaling is a widely used technique in many areas, including psychology (\cite{Takane:2007}), sociology (\cite{Pinkley:2005}), linguistics (\cite{Embleton:2013}),
marketing research (\cite{Cooper:1983}), tourism (\cite{Marcussen:2014}), musicology (\cite{Adams:1995}).


\subsection[The approaches in multidimensional scaling via mdsOpt package]{The approaches in multidimensional scaling via \pkg{mdsOpt} package}

In MDS carried out on the basis of a metric data matrix (interval, ratio) or interval-valued
data table, via \pkg{mdsOpt} package, three approaches can be distinguished:

\begin{enumerate}
\item {
Classic-to-classic -- for metric data:}
\end{enumerate}

\vspace*{-0.5cm}
 \textbf{} \begin{equation} \label{eq:1} \mathbf{X}=[x_{\mathit{ij}}]_{n\times m}\rightarrow \mathbf{Z}=[z_{\mathit{ij}}]_{n\times m}\rightarrow \left[\delta
_{\mathit{ik}}(\mathbf{Z})\right]_{\mathit{nxn}}\rightarrow f:\left[\delta _{\mathit{ik}}\left(\mathbf{Z}\right)\rightarrow
d_{\mathit{ik}}(\mathbf{V})\right]\rightarrow \mathbf{V}=[v_{\mathit{ij}}]_{n\times q},\end{equation}

\noindent where: $x_{\mathit{ij}}$ -- the value of the \textit{j}{}-th variable for the \textit{i}{}-th
object{,}
$z_{\mathit{ij}}$ -- the normalized value of the \textit{j}{}-th variable for the \textit{i}{}-th
object{,}
$i,k=1,{\dots},n$ -- the number of the object,
$j=1,{\dots},m$ -- the number of variable,
$\left[\delta_{\mathit{ik}}(\mathbf{Z})\right]_{\mathit{nxn}}$ -- a distance matrix (dissimilarities) between objects in
\textit{m}{}-dimensional space (distances between objects are calculated via e.g. city-block, Euclidean, Chebyshev, squared Euclidean),
$[d_{\mathit{ik}}(\mathbf{V})]$ -- a distance matrix between objects in \textit{q}{}-dimensional space ($q<m)$,
\textit{f} -- function which mapping distances in \textit{m}{}-dimensional space  $\delta _{\mathit{ik}}\left(\mathbf{Z}\right)$
into corresponding distances  $d_{\mathit{ik}}(\mathbf{V})$ in \textit{q}{}-dimensional space,
$\mathbf{V}=[v_{\mathit{ij}}]_{n\times q}$ -- data matrix in \textit{q}{}-dimensional space.


The starting point of multidimensional scaling in classic-to-classic approach is a metric data matrix 
$\mathbf{X}=[x_{\mathit{ij}}]_{n\times m}${, } for which observations are obtained from secondary data sources.
It is typical situation in socio-economic research. Methods of determining
the distance matrix  $\left[\delta _{\mathit{ik}}\right]$ can be divided into direct (typically result from
similarity ratings on object pairs, from rankings, or from card-sorting tasks) and indirect
(see (\ref{eq:1})) methods {(see e.g. \cite{Borg+Groenen:2005}, pp. 111-133)}. 

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item {
Symbolic-to-classic -- for interval-valued data:}
\end{enumerate}

\vspace*{-0.5cm}
\textbf{} \begin{equation} 
\begin{split} 
\mathbf{X} &=[x_{\mathit{ij}}^l,x_{\mathit{ij}}^u]_{n\times m}\rightarrow
\mathbf{Z}=[z_{\mathit{ij}}^l,z_{\mathit{ij}}^u]_{n\times m}\rightarrow \left[\delta
_{\mathit{ik}}(\mathbf{Z})\right]_{\mathit{nxn}}\rightarrow \\
& f:\left[\delta _{\mathit{ik}}\left(\mathbf{Z}\right)\rightarrow
d_{\mathit{ik}}(\mathbf{V})\right]\rightarrow \mathbf{V}=[v_{\mathit{ij}}]_{n\times q},
\end{split}
\end{equation}


\noindent where: $\mathbf{X}=[x_{\mathit{ij}}^l,x_{\mathit{ij}}^u]_{n\times m}$ -- an interval-valued data table in \textit{m}{}-dimensional space ($x_{\mathit{ij}}^l{\leq}x_{\mathit{ij}}^u$),
$x_{\mathit{ij}}^l$ ($x_{\mathit{ij}}^u$) -- the lower (upper) bound of interval,
$\mathbf{Z}=[z_{\mathit{ij}}^l,z_{\mathit{ij}}^u]_{n\times m}$ {}-- the normalized interval-valued data table in \textit{m}{}-dimensional space,
$\delta _{\mathit{ik}}\left(\mathbf{Z}\right)$ -- a distance matrix (dissimilarities) between objects in
\textit{m}{}-dimensional space (distances between objects are calculated via distance measures for interval-valued data -- see Table \ref{tab:3}).

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item {
Symbolic-to-symbolic -- for interval-valued data:}
\end{enumerate}

\vspace*{-0.5cm}
\textbf{} \begin{equation}
\begin{split} 
\mathbf{X} &=[x_{\mathit{ij}}^l,x_{\mathit{ij}}^u]_{n\times m}\rightarrow
\mathbf{Z}=[z_{\mathit{ij}}^l,z_{\mathit{ij}}^u]_{n\times m}\rightarrow \left[\delta _{\mathit{ik}}^l,\delta
_{\mathit{ik}}^u\right]\rightarrow \\
& f:\left(\left[\delta _{\mathit{ik}}^l,\delta _{\mathit{ik}}^u\right]\rightarrow
\left[d_{\mathit{ik}}^l,d_{\mathit{ik}}^u\right]\right)\rightarrow
\mathbf{V}=[v_{\mathit{ij}}^l,v_{\mathit{ij}}^u]_{n\times q},
\end{split}
\end{equation} 

\noindent where: $\delta _{\mathit{ik}}^l(\delta _{\mathit{ik}}^u)$ -- {the
lower (upper) bound of the dissimilarity of objects }\textit{{i
}}{and }\textit{{k }}in
\textit{m}{}-dimensional space,
$d_{\mathit{ik}}^l(d_{\mathit{ik}}^u)$ -- {the lower (upper) bound of the dissimilarity of objects }\textit{{i
}}{and }\textit{{k }}in
\textit{q}{}-dimensional space,
\textit{f} -- function which represent the lower and upper bounds of the dissimilarities by minimum and maximum
distances between rectangles (cubes in three-dimensional space) as well as possible distances in the sense of least-squares (\cite{Groenen+Winsberg+Rodriguez+Diday:2006}, p. 363),
$\mathbf{V}=[v_{\mathit{ij}}^l,v_{\mathit{ij}}^u]_{n\times q}$ -- an interval-valued data table in \textit{q}{}-dimensional space.

In symbolic-to-classic and symbolic-to-symbolic approaches we assume that the starting point of multidimensional scaling is data table 
$\mathbf{X}=[x_{\mathit{ij}}^l,x_{\mathit{ij}}^u]_{n\times m}$ ($x_{\mathit{ij}}^l{\leq}x_{\mathit{ij}}^u$). In article (\cite{Gioia+Lauro:2006}, p. 344) we can find different kind of data that in real life are of interval type:

\begin{itemize}
\item[\textbullet]
financial data (e.g. opening value and closing value in a session),
\item[\textbullet]
customer satisfaction data (expected or perceived characteristic of the quality of a product), 
\item[\textbullet]
tolerance limits in quality control, 
\item[\textbullet]
confidence intervals of estimates from sample surveys,
\item[\textbullet]
query on a database. 
\end{itemize}

Additional suggestions about real life interval-valued data we can find in \cite{Brito+Noirhomme-Fraiture+Arroyo:2015}:

\begin{itemize}
\item[\textbullet] 
{high--low intervals of financial prices,}
\item[\textbullet] 
some questions in the questionnaire surveys (e.g. age, income, {time
spent).}
\end{itemize}

Interval-valued data we can obtain by generalization of classical single-valued variables into interval-valued variables
(see e.g. \cite{Bock:2000}, pp. 43-44). For example, 380 Polish counties are described by 9 metric
variables (see the second demonstration example). Counties are part of 16 Polish voivodships. After aggregation of data from counties to voivodships,
interval-valued data are obtained. Interval of a given variable for the voivodship covers all or selected (e.g. from
first to ninth decile, from first to third quartile) observations from counties.

\subsection[The main idea of the mdsOpt package]{The main idea of the \pkg{mdsOpt} package}

The authors of the monograph (\cite{Borg+Groenen+Mair:2013}; \cite{Borg+Groenen+Mair:2018}, chapter 7) pointed out the typical mistakes made by users of
multidimensional scaling. A frequent mistake on the part of users of MDS results is to evaluate Stress mechanically (rejecting an MDS solution because its Stress seems ``too high''). In
their opinion (\cite{Borg+Groenen+Mair:2018}, pp. 85-86) ``The Stress value is, however, merely a technical index, a target criterion for an optimization algorithm. A{n MDS
solution can be robust and replicable, even if its Stress value is high'' and ``Stress is a
}\textit{{ summative}}{ index for}\textit{{ all}} {proximities. It does not inform the user how well a}\textit{{ particular
}}{proximity value is represented in the given MDS space (...) The least one can do is to take a look at the Stress-per-point values''. }Considering that we should 
take into account stress per point values (\cite{Borg+Mair:2017}) and Shepard diagram (\cite{Mair+Borg+Rusch:2016}; \cite{Leeuw+Mair:2015}) for classic{}-to-classic and symbolic{}-to-classic approaches or the  $I\text{{}-}\mathit{Stress}$ per box index
(\textit{ispb}) and the  $I\text{{}-}\mathit{dist}$ diagram for symbolic{}-to-symbolic approach.

\subsection{Criteria for selection of the optimal MDS procedure}

To solve the problem of choosing the optimal multidimensional scaling procedure in:

\begin{itemize}
\item[\textbullet]
Classic-to-classic and symbolic-to-classic approaches two criteria were applied in \pkg{mdsOpt} package: 
Kruskal's $\mathit{Stress}\text{{}-}1$ (standardized residual sum of squares) fit measure and the Hirschman-Herfindahl $\mathit{HHI}$ index, calculated based on Stress per point values (\textit{spp}).
\item[\textbullet] 
Symbolic-to-symbolic approach two criteria were applied in \pkg{mdsOpt} package:  $I\text{{}-}\mathit{Stress}$ fit measure and the Hirschman-Herfindahl $\mathit{HHI}$ index, calculated based on  $I\text{{}-}\mathit{Stress}$ per box index values (\textit{ispb}). 
\end{itemize}

\subsection[Package mdsOpt versus other packages]{Package \pkg{mdsOpt} versus other packages}

The algorithms implemented in the \pkg{mdsOpt} package have not been used in other R program packages so far and it can be
treated as a complementary package for well-known libraries \pkg{smacof} (\cite{Mair+Leeuw+Borg+Groenen:2019}; \cite{Leeuw+Mair:2009}) and
\pkg{smds} (\cite{Terada+Groenen:2015}), extending theirs possibilities. The relationships between \pkg{mdsOpt} and other R packages
present Table \ref{tab:1}.

\begin{table}[!ht]
\centering
\begin{tabular}{m{4.5cm}m{4.5cm}m{5cm}}
\toprule
\multicolumn{3}{m{14cm}}{\centering {MDS approach}}\\
\multicolumn{1}{m{4.5cm}}{Classic-to-classic} &
\multicolumn{1}{m{4.5cm}}{Symbolic-to-classic} &
\arraybslash {Symbolic-to-symbolic}\\\toprule
\multicolumn{3}{m{14cm}}{\centering {Type of data}}\\
\multicolumn{1}{m{4.5cm}}{metric (ratio, interval)} &
\multicolumn{1}{m{4.5cm}}{interval-valued} &
\arraybslash interval-valued\\
\multicolumn{3}{m{14cm}}{\centering {Functions of \pkg{mdsOpt}
package}}\\
\multicolumn{1}{m{4.5cm}}{{optSmacofSym\_mMDS}} &
\multicolumn{1}{m{4,5cm}}{{optSmacofSymInterval}} &
\arraybslash {optIscalInterval}\\

\multicolumn{3}{m{14cm}}{\centering Decision problem 1: normalization method}\\
\multicolumn{1}{m{4.5cm}}{{{\pkg{clusterSim}}\par}
{{(data.Normalization)}\par}
\pkg{\pkg{base}} (\cite{R:2019}) (scale)} &
\multicolumn{1}{m{4.5cm}}{{{\pkg{clusterSim}}\par}
{(interval\_normalization)}} &
\arraybslash {{{\pkg{clusterSim}}\par}
{(interval\_normalization)}}\\

\multicolumn{3}{m{14cm}}{\centering Decision problem 2: distance measure}\\
\multicolumn{1}{m{4.5cm}}{{Manhattan, Euclidean,\newline
Chebyshev, Squared\newline
Euclidean, GDM1} \newline
{{\pkg{stats} (\cite{R:2019}) (dist)\par}
{\pkg{clusterSim} (dist.GDM)}}} &
\multicolumn{1}{m{4.5cm}}{{Ichino-Yaguchi, Euclidean\newline
Ichino-Yaguchi, Hausdorff, \newline
Euclidean Hausdorff} \newline
{{\pkg{clusterSim} (dist.Symbolic)}}} &
\arraybslash {{}--}\\
\multicolumn{3}{m{14cm}}{\centering Decision problem 3: MDS model / optimization
method}\\
\multicolumn{1}{m{4.5cm}}{{{ratio, interval, polynomial}} \newline
{{\pkg{smacof} (smacofSym)}}} &
\multicolumn{1}{m{4.5cm}}{{{ratio, interval, polynomial}} \newline
{{\pkg{smacof} (smacofSym)}}} &

\arraybslash {majorization-minimization (MM), quasi-Newton (BFGS) \newline
\arraybslash {\pkg{smds} (IMDS)}}\\\bottomrule
\end{tabular}

\caption{Relationships between \pkg{mdsOpt} and other R packages}\label{tab:1}
\end{table}


Additionally \pkg{mdsOpt} contain functions for calculation of  $I\text{{}-}\mathit{Stress}$ per box index (\textit{ispb}) and charting
 $I\text{{}-}\mathit{dist}$ diagram for interval-valued data.


\section{Selection of the optimal multidimensional scaling procedure}

The article proposes a solution that allows the optimal multidimensional scaling procedure, for metric and
interval-valued data, to be chosen. 


\subsection{Basic decision problems}

{For classic-to-classic and symbolic-to-classic approaches t}he study uses the function
{smacofSym of \pkg{smacof} package of R program. In smacofSym function }basic decision problems involve the
following selection:

{{}--\ \ normalization method (t}he analysis include 18 normalization methods -- see Table
\ref{tab:2}){,}

{{}--\ \ distance measure: 5 }for metric data (Manhattan, Euclidean, Chebyshev, Squared Euclidean,
GDM1\footnote{{ Cf. \cite{Jajuga+Walesiak+Bak:2003}.}} -- {see e.g.} \cite{Everitt+Landau+Leese+Stahl:2011}, pp. 49-50) and 4 for interval-valued data  {(see Table 3), }

{{}--\ \ MDS model} ({t}he analysis include 3 MDS models: ratio, interval, polynomial).

{For symbolic-to-symbolic approach the study uses the function IMDS of \pkg{smds} package.
In function IMDS basic decision problems involve the following selection:}

{{}--\ \ normalization method -- t}he analysis include 18 normalization methods{,}

{{}--\ \ optimization method -- t}he analysis include 2 methods: the majorization minimization
algorithm ``MM'' (\cite{Groenen+Winsberg+Rodriguez+Diday:2006}, p.
366); quasi-Newton method ``BFGS'' (\cite{Nash:1990}, chapter 15).


Table \ref{tab:2} presents normalization methods, given by linear formula \eqref{eq:4}, which were used in the selection of the optimal MDS
procedure (see \cite{Jajuga+Walesiak:2000}, pp. 106-107):

 \begin{equation}\label{eq:4}  z_{\mathit{ij}}=b_jx_{\mathit{ij}}+a_j=\frac{x_{\mathit{ij}}-A_j}{B_j}=\frac 1{B_j}x_{\mathit{ij}}-\frac{A_j}{B_j} ( b_j>0), \end{equation}

\noindent where: $x_{\mathit{ij}}$ -- the value of \textit{j}{}-th variable for the \textit{i}{}-th object,
$z_{\mathit{ij}}$ -- the normalized value of \textit{j}{}-th variable for the \textit{i}{}-th object,
$A_j$ -- shift parameter to arbitrary zero for the \textit{j}{}-th variable,
$B_j$ -- scale parameter for the \textit{j}{}-th variable.

\begin{table}[!ht]
\begin{tabular}{m{1cm}m{8.5cm}m{3.5cm}m{1.5cm}}
\toprule
\multirow{2}{*}{\parbox{1cm}{\centering Type}} &
\multirow{2}{*}{\parbox{8.5cm}{\centering Method}} &
\multicolumn{2}{m{5cm}}{\centering Parameter}\\
 &
 &
\centering $B_j$ &
\centering\arraybslash  $A_j$\\
\toprule
\centering n1 &
 Standardization &
\centering  $s_j$ &
\centering\arraybslash  $\overline x_j$\\
\centering n2 &
 Positional standardization &
\centering  $\mathit{mad}_j$ &
\centering\arraybslash  $\mathit{med}_j$\\
\centering n3 &
 Unitization &
\centering  $r_j$ &
\centering\arraybslash  $\overline x_j$\\
\centering n3a &
 Positional unitization &
\centering  $r_j$ &
\centering\arraybslash  $\mathit{med}_j$\\
\centering n4 &
 Unitization with zero minimum &
\centering  $r_j$ &
\centering\arraybslash  $\underset i{\mathit{min}}\left\{x_{\mathit{ij}}\right\}$\\
\centering n5 &
 Normalization in range [--1; 1] &
\centering  $\underset i{\mathit{max}}\left|x_{\mathit{ij}}-\overline x_j\right|$ &
\centering\arraybslash  $\overline x_j$\\
\centering n5a &
Positional normalization in range [--1; 1] &
\centering  $\underset i{\mathit{max}}\left|x_{\mathit{ij}}-\mathit{med}_j\right|$ & 
\centering\arraybslash  $\mathit{med}_j$\\

\centering n6 &
\multirow{8}{*}{\parbox{7.5cm}{Quotient \newline transformations}} &
\centering $s_j$ &
\centering\arraybslash 0\\

\centering n6a &
 &
\centering $\mathit{mad}_j$ &
\centering\arraybslash 0\\
\centering n7 &
 &
\centering $r_j$ &
\centering\arraybslash 0\\
\centering n8 &
 &
\centering $\underset i{\mathit{max}}\left\{x_{\mathit{ij}}\right\}$ &
\centering\arraybslash 0\\
\centering n9 &
 &
\centering $\overline x_j$ &
\centering\arraybslash 0\\
\centering n9a &
 &
\centering $\mathit{med}_j$ &
\centering\arraybslash 0\\
\centering n10 &
 &
\centering $\sum _{i=1}^nx_{\mathit{ij}}$ &
\centering\arraybslash 0\\
\centering n11 &
 &
\centering $\sqrt{\sum _{i=1}^nx_{\mathit{ij}}^2}$ &
\centering\arraybslash 0\\

\centering n12 &
 Normalization &
\centering  $\sqrt{\sum _{i=1}^n\left(x_{\mathit{ij}}-\overline x_j\right)^2}$ &
\centering\arraybslash  $\overline x_j$\\
\centering n12a &
 Positional normalization &
 \centering $\sqrt{\sum _{i=1}^n\left(x_{\mathit{ij}}-\mathit{med}_j\right)^2}$ &
\centering\arraybslash  $\mathit{med}_j$\\
\centering n13 &
 Normalization with zero being the central point &
\centering  $r_j/2$ &
\centering\arraybslash  $m_j$\\\bottomrule
\end{tabular}

 $\overline x_j$ -- mean for the \textit{j}{}-th variable,  $s_j$ -- standard deviation for the \textit{j}{}-th variable, 
$r_j$ -- range for the \textit{j}{}-th variable,  $m_j=\left(\underset
i{\mathit{max}}\left\{x_{\mathit{ij}}\right\}+\underset i{\mathit{min}}\left\{x_{\mathit{ij}}\right\}\right)/2$ --
mid-range for the \textit{j}{}-th variable,  $\mathit{med}_j=\underset i{\mathit{med}}\left(x_{\mathit{ij}}\right)$ --
median for the \textit{j}{}-th variable,  $\mathit{mad}_j=\underset i{\mathit{mad}}\left(x_{\mathit{ij}}\right)$ --
median absolute deviation for the \textit{j}{}-th variable.

\caption{Normalization methods (based on \cite{Jajuga+Walesiak:2000}; \cite{Walesiak:2018})}\label{tab:2}
\end{table}

The normalization of variables is carried out when the variables describing the analyzed objects are metric or 
interval-valued. The purpose of normalization is to achieve the comparability of variables (\cite{Milligan+Cooper:1988}).

For classical metric data an observation on the \textit{j}{}-th variable for the \textit{i}{}-th object in a data matrix
 $\mathbf{X}=[x_{\mathit{ij}}]_{n\times m}$ is expressed as one real number. Column 1 in Table \ref{tab:2} presents the type of normalization
method adopted as the function data.Normalization of \pkg{clusterSim} package (\cite{Walesiak+Dudek:2019}). Similar procedure for
data normalization is available as the function scale of \pkg{base} package. In this function the researcher defines the
parameters  $A_j$ and  $B_j$. 

For interval-valued variables each cell  $x_{\mathit{ij}}$ in a data table represents the interval 
$x_{\mathit{ij}}=[x_{\mathit{ij}}^l,x_{\mathit{ij}}^u]$  $(x_{\mathit{ij}}^l{\leq}x_{\mathit{ij}}^u$). Interval-valued
data require a special normalization approach. The lower and upper bound of the interval of the \textit{j-}th variable
for \textit{n} objects are combined into one vector containing \textit{2n} observations. This approach makes it
possible to apply normalization methods used for classical metric data. {Other approaches to
normalization of interval-valued data are presented in } (\cite{Mlodak:2014}). {After normalization
process observations on each variable from 1 to}{\textit{ n}}{ create
the lower bound of intervals while observations from } $n+1${ to }
$2n${ create the upper bound. }The data were normalized using the interval\_normalization
function from the \pkg{clusterSim} package. 

Table \ref{tab:3} presents selected distance measures for interval-valued data that have been used in the selection of the optimal
multidimensional scaling procedure. {The methods for calculating these distances are available in
dist.symbolic function of \pkg{clusterSim} package.}

\begin{table}[!ht]
\begin{tabular}{m{1.5cm}m{4.5cm}m{8cm}}
\toprule
Symbol &
Name &
\arraybslash Distance measure  $\delta _{\mathit{ik}}\left(\mathbf{Z}\right)$\\\toprule
U\_2\_q1 &
{Ichino-Yaguchi\par}

$q=1,\gamma =0.5$ &
 $\sum _{j=1}^m\varphi \left(z_{\mathit{ij}},z_{\mathit{kj}}\right)$\\
U\_2\_q2 &
{Euclidean Ichino-Yaguchi\par}

$q=2,\gamma =0.5$ &
 $\sqrt{\sum _{j=1}^m\varphi \left(z_{\mathit{ij}},z_{\mathit{kj}}\right)^2}$\\
H\_q1 &
Hausdorff\par
$q=1$ &
 $\sum
_{j=1}^m\left[\mathit{max}\left(\left|z_{\mathit{ij}}^l-z_{\mathit{kj}}^l\right|,\left|z_{\mathit{ij}}^u-z_{\mathit{kj}}^u\right|\right)\right]$\\
H\_q2 &
{Euclidean Hausdorff\par}

 $q=2$ &
 $\left\{\sum
_{j=1}^m\left[\mathit{max}\left(\left|z_{\mathit{ij}}^l-z_{\mathit{kj}}^l\right|,\left|z_{\mathit{ij}}^u-z_{\mathit{kj}}^u\right|\right)\right]^2\right\}^{1/2}$\\\bottomrule
\end{tabular}

 $\varphi
\left(z_{\mathit{ij}},z_{\mathit{kj}}\right)=\left|z_{\mathit{ij}}\oplus z_{\mathit{kj}}\right|-\left|z_{\mathit{ij}}\otimes z_{\mathit{kj}}\right|+\gamma
\left(2{\cdot}\left|z_{\mathit{ij}}\otimes z_{\mathit{kj}}\right|-\left|z_{\mathit{ij}}\right|-\left|z_{\mathit{kj}}\right|\right)$;
 $\left|\right|$ -- length of interval, 
$z_{\mathit{ij}}\oplus z_{\mathit{kj}}=z_{\mathit{ij}}{\cup}z_{\mathit{kj}}$, 
$z_{\mathit{ij}}\otimes z_{\mathit{kj}}=z_{\mathit{ij}}{\cap}z_{\mathit{kj}}.$

\caption{Distance measures for interval-valued data (based on \cite{Billard+Diday:2006}, pp. 244-246; 
\cite{Esposito+Malerba+Tamma:2000}, pp. 165-185; \cite{Ichino+Yaguchi:1994})}\label{tab:3}

\end{table}

\subsection{Stages in selecting the optimal procedure for MDS}

The initial point of the application of smacofSym function is to determine e.g. the following values of arguments
({all parameters can be changed by the user}):

{{}--\ \ initial configuration (``torgerson'' classical scaling starting solution),}

{{}--\ \ convergence criterion (eps=1e-06),}

{{}--\ \ maximum number of iterations (itmax=1000).}

The initial point of the application of IMDS function of \pkg{smds} package is to determine e.g. the
following values of arguments ({all parameters can be changed by the user}):

{{}--\ \ initial configuration (the hyper-rectangles with centres assigned as the result of classical multidimensional scaling of
primary space interval centres and vertices distant from the centres by one),}

{{}--\ \ convergence criterion (}eps=1e-5){,}

{{}--\ \ maximum number of iterations (maxit=1000).}

Selecting the optimal procedure for multidimensional scaling takes place in several stages{:}

\begin{enumerate}
\item {
Set the number of dimensions in MDS to two (ndim=2).}
\end{enumerate}

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item {
Taking into account in the analysis:}
\end{enumerate}

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item[\textbullet] 
In classic-to-classic approach 10 normalization methods, 5 distance measures and 4 MDS models (mspline model -- polynomial function of second and third degree), there are 200
multidimensional scaling procedures.
\end{enumerate}

Due to the fact that the groups of A, B, C and D (see Table \ref{tab:4}) normalization methods give identical multidimensional
scaling results, further analysis covers the first methods of the identified groups (n1, n2, n3, n9), as well as the
other methods (n5, n5a, n8, n9a, n11, n12a).

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item[\textbullet] 
In symbolic-to-classic approach 18 normalization methods, 4 distance measures for interval-valued data and 4 MDS models,
there are 288 multidimensional scaling procedures. 
\item[\textbullet] 
In symbolic-to-symbolic approach 18 normalization methods and 2 {optimization methods }there are 36
multidimensional scaling procedures.
\end{enumerate}

\begin{table}[!ht]
\begin{tabular}{m{1.5cm}m{4cm}m{8.5cm}}
\toprule
\multirow{2}{*}{\parbox{1.5cm}{Groups}} &
\multicolumn{2}{m{11cm}}{\centering Normalization methods}\\
 &
GDM1 distance &
\arraybslash {Minkowski distances, squared Euclidean
distance}*\\\toprule
{A} &
 {n1, n6, n12} &
 {n1, n6, n12}\\
{B} &
 {n2, n6a} &
 {n2, n6a}\\
{C} &
 {n3, n3a, n4, n7, n13} &
 {n3, n3a, n4, n7, n13}\\
{D} &
 {n9, n10} &
 {n9, n10}\\\bottomrule
\end{tabular}

* after dividing distances in each distance matrix by the maximum value.

\caption{The groups of normalization methods resulting in identical distance matrices
 (\cite{Walesiak+Dudek:2017})}\label{tab:4}
\end{table}

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item {
Multidimensional scaling is performed for each procedure separately. It then orders the procedures by increasing:}
\end{enumerate}

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item[\textbullet] 
 $\mathit{Stress}\text{{}-}1$ fit measure in classic-to-classic and in symbolic-to-classic approaches (see
e.g. {\cite{Borg+Groenen+Mair:2018}}, p. 32{)}:
\end{enumerate}

\vspace*{-0.5cm}
\begin{equation}\mathit{Stress}\text{{}-}1_p=\sqrt{\sum _{i<k}\left[d_{\mathit{ik}}\left(\mathbf{V}\right)-\widehat 
d_{\mathit{ik}}\right]^2/\sum _{i<k}d_{\mathit{ik}}^2\left(\mathbf{V}\right)},\end{equation}\ \

\noindent where: $p$\textit{ --} multidimensional scaling procedure number, 
$\widehat d_{\mathit{ik}}$ -- d-hats, disparities, target distances or pseudo distances (see \cite{Borg+Groenen:2005}, p. 199),
$\widehat d_{\mathit{ik}} = f(\delta _{\mathit{ik}})$ by defining $f$\textit { }in different ways (ratio, interval, polynomial MDS).

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item[\textbullet] 
 $I\text{{}-}\mathit{Stress}$ fit measure in symbolic-to-symbolic approach
({\cite{Groenen+Winsberg+Rodriguez+Diday:2006}, p. 363}):
\end{enumerate}

\vspace*{-0.5cm}
\begin{equation}I\text{{}-}\mathit{Stress}_p=\frac{\sum _{i<k}^nw_{\mathit{ik}}\left[\delta
_{\mathit{ik}}^u-d_{\mathit{ik}}^u\right]^2+\sum _{i<k}^nw_{\mathit{ik}}\left[\delta
_{\mathit{ik}}^l-d_{\mathit{ik}}^l\right]^2}{\sum _{i<k}^nw_{\mathit{ik}}\left[\delta _{\mathit{ik}}^u\right]^2+\sum
_{i<k}^nw_{\mathit{ik}}\left[\delta _{\mathit{ik}}^l\right]^2},\end{equation}\ \

\noindent where: $\delta _{\mathit{ik}}^l,\delta _{\mathit{ik}}^u$ ($d_{\mathit{ik}}^l,d_{\mathit{ik}}^u$) --
{the lower and upper bound of the dissimilarity} in
\textit{m}{}-dimensional space (\textit{q}{}-dimensional space), $w_{\mathit{ik}}$ -- {nonnegative weight} (in general 
$w_{\mathit{ik}}=1$).


\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item{Based on Stress per point (\textit{{spp}}{) values (Stress
contribution in percentages), the }Hirschman-Herfindahl index is calculated (\cite{Herfindahl:1950}; \cite{Hirschman:1964}) in
classic-to-classic and in symbolic-to-classic approaches:}
\end{enumerate}

\vspace*{-0.5cm}
\begin{equation}\mathit{HHI}_p=\sum _{i=1}^n\mathit{spp}_{\mathit{pi}}^2,\end{equation}

\noindent where: $i,k=1,{\dots},n$ -- object number.


{Based on Interval stress per box (}\textit{{ispb}}{) values
(Interval Stress contribution in percentages), the }Hirschman-Herfindahl index is calculated in symbolic-to-symbolic
approach:

\begin{equation}\mathit{HHI}_p=\sum _{i=1}^n\mathit{ispb}_{\mathit{pi}}^2,\end{equation}


\noindent where: \begin{equation}\nonumber\mathit{ispb}_i=\frac{\left(\sum _{k=1}^nw_{\mathit{ik}}\left[\delta
_{\mathit{ik}}^u-d_{\mathit{ik}}^u\right]^2+\sum _{k=1}^nw_{\mathit{ik}}\left[\delta
_{\mathit{ik}}^l-d_{\mathit{ik}}^l\right]^2\right)/n}{\sum _{i=1}^n\left[\left(\sum _{k=1}^nw_{\mathit{ik}}\left[\delta
_{\mathit{ik}}^u-d_{\mathit{ik}}^u\right]^2+\sum _{k=1}^nw_{\mathit{ik}}\left[\delta
_{\mathit{ik}}^l-d_{\mathit{ik}}^l\right]^2\right)/n\right]}{\cdot}100\end{equation}


{The } $\mathit{HHI}_p$ { index takes values in the interval } $\left[\frac{10,000}
n;10,000\right]${. The value } $\frac{10,000} n$ { means that the distribution of
errors for individual objects is uniform. Maximal value appears when summary fit meas}ure ($\mathit{Stress}\text{{}-}1$, $I\text{{}-}\mathit{Stress}$) is the result of loss assigned only to one object. For
other objects, loss func{tion will be equal to zero. The optimal situation for a multidimensional
scaling procedure is the minimal value of the } $\mathit{HHI}_p$ { index.}

\setcounter{saveenum}{\value{enumi}}
\begin{enumerate}
\setcounter{enumi}{\value{saveenum}}
\item
{The chart with} $\mathit{Stress}\text{{}-}1_p$ {(}$I\text{{}-}\mathit{Stress}_p$) fit measure value on
\textit{{x}}{{}-axis and } $\mathit{HHI}_p$ { index on
}\textit{{y}}{{}-axis for }\textit{p}{ procedures
of multidimensional scaling is drawn.}
\item 
{The maximal acceptable value of } $\mathit{Stress}\text{{}-}1$ { (}$I\text{{}-}\mathit{Stress}${) is assumed as }\textit{cs} 
(may be calculated as a midrange or median of $\mathit{Stress}\text{{}-}1$ { (}$I\text{{}-}\mathit{Stress}${)}){. For all
multidimensional scaling procedures, for which } $\mathit{Stress}\text{{}-}1_p{\leq}cs$ {(}$I\text{{}-}\mathit{Stress}_p{\leq}cs${), we choose the one for each occurs } $\underset
p{\mathit{min}}\left\{\mathit{HHI}_p\right\}${.}  
\item 
{Multidimensional scaling for the selected procedure is performed along with checkout that in the sense
of interpretation results are acceptable. Based on the Shepard diagram (}$I\text{{}-}\mathit{dist}$
diagram{) and } $\mathit{Stress}$ { (}$I\text{{}-}\mathit{Stress}${)
plot, the correctness of the model scaling will be evaluated. If the results are acceptable the procedure ends,
otherwise it returns to step 1 and multidimensional scaling for three dimensions is performed (ndim=3).}
\end{enumerate}

\section{Using the package -- examples}

\subsection{Metric data (classic-to-classic approach)}

In first example we will find the optimal solution for classic-to-classic MDS approach. The package \pkg{mdsOpt} contains
dataset called data\_lower\_silesian {referring to the attractiveness level of 31 objects (29 Lower
Silesian counties, Pattern and Anti-pattern object) described by 16 metric variables:}


\noindent x1 -- beds in hotels per 1 km\textsuperscript{2} of a county area,

\noindent x2 -- number of nights spent daily by resident tourists per 1,000 inhabitants of a county,

\noindent x3 -- number of nights spent daily by foreign tourists per 1,000 inhabitants of a county,

\noindent x4 -- gas pollution emission in tons per 1 km\textsuperscript{2} of a county area,

\noindent x5 -- number of criminal offences and crimes against life and health per 1,000 inhabitants of a county,

\noindent x6 -- number of property crimes per 1,000 inhabitants of a county,

\noindent x7 -- number of historical buildings per 100 km\textsuperscript{2} of a county area,

\noindent x8 -- \% of a county forest cover,

\noindent x9 -- \% share of legally protected areas within a county area,

\noindent x10 -- number of events as well as cultural and tourist ventures in a county,

\noindent x11 -- number of natural monuments calculated per 1 km\textsuperscript{2} of a county area,

\noindent x12 -- number of tourist economy entities per 1,000 inhabitants of a county (natural and legal persons),

\noindent x13 -- expenditure of municipalities and counties on tourism, culture and national heritage protection as well as
physical culture per 1 inhabitant of a county in Polish zloty amounts (PLN),

\noindent x14 -- cinema attendance per 1,000 inhabitants of a county,

\noindent x15 -- museum visitors per 1,000 inhabitants of a county,

\noindent x16 -- number of construction permits (hotels and accommodation buildings, commercial and service buildings, transport
and communication buildings, civil and water engineering constructions) issued in the county in the years 2011-2012,
per 1 km\textsuperscript{2} of the county area.


The statistical data were collected in 2012 and come from the Local Data Bank of the Statistics Poland (GUS); the data for x7 variable only were obtained from the regional conservation officer. {Variables x1-x3, x7, x8, x10-x16 represent stimulants (where
}{higher values are more p


rred}{), variables
x4, x5 and x6 take the form of destimulants (where lower values are more preferred), and x9 is a nominant (nominal
value is the best value and lies within range of variable -- 50\% level was adopted as the optimal one). Variable x9
was transformed into a stimulant. }The coordinates of a Pattern object cover the most preferred preference variable
values (maximum for stimulant, minimum for destimulant). The coordinates of an Anti-pattern object cover the least
preferred preference variable values (minimum for stimulant, maximum for destimulant).

First we load package and dataset.

\begin{CodeChunk}
\begin{CodeInput}
R> library(mdsOpt)
R> data(data_lower_silesian)
\end{CodeInput}
\end{CodeChunk}


Then set the normalizations methods, distance measures and MDS models used in selection of optimal MDS procedure.

\begin{CodeChunk}
\begin{CodeInput}
R> metnor<-c("n1","n2","n3","n5","n5a","n8","n9","n9a","n11","n12a")
R> metscale<-c("ratio","interval","mspline")
R> metdist<-c("euclidean","manhattan","seuclidean","maximum","GDM1")
\end{CodeInput}
\end{CodeChunk}


The normalizations methods, distance measures and MDS models are used in next step (please notice that model mspline
is used twice with spline.degree parameter equals 2 or 3) for selecting the optimal multidimensional scaling
procedure.

\begin{CodeChunk}
\begin{CodeInput}
R> res<-optSmacofSym_mMDS(data_lower_silesian,normalizations=metnor,
+ distances=metdist,mdsmodels=metscale,spline.degrees=c(2:3),outDec=".",
+ stressDigits=6,HHIDigits=2)
\end{CodeInput}
\end{CodeChunk}

The results contain 200 rows (10 normalization methods x 5 distance measures x 4 MDS models) each describing one
procedure with the six columns: Normalization method, MDS model, Spline degree, Distance measure, STRESS 1, HHI spp. 
The values are ordered by STRESS 1 value.

Before displaying the result we need to change the max.print system option to value greater or equals 1200 (10
normalization methods x 5 distance measures x 4 MDS models x 6 columns).



\begin{CodeChunk}
\begin{CodeInput}
R> options(max.print=1200)
R> print(res)
\end{CodeInput}
\begin{CodeOutput}
       Normalization MDS model  Spline   Distance      STRESS 1   HHI spp
       method                   degree   measure
  [1,] "n9a"         "mspline"  "3"      "euclidean"   "0.026339" " 821.90"
  [2,] "n9a"         "mspline"  "2"      "euclidean"   "0.026451" " 856.47"
  [3,] "n9a"         "mspline"  "2"      "seuclidean"  "0.026967" " 791.68"
  ...
[198,] "n8"          "ratio"    ""       "maximum"     "0.261772" " 414.10"
[199,] "n3"          "ratio"    ""       "maximum"     "0.265246" " 414.13"
[200,] "n5"          "ratio"    ""       "maximum"     "0.266663" " 404.71"
\end{CodeOutput}
\end{CodeChunk}

Then we convert $\mathit{Stress}\text{{}-}1$ and $\mathit{HHI}$ values to numeric vectors.

\begin{CodeChunk}
\begin{CodeInput}
R> stress<-as.numeric(res[,"STRESS 1"])
R> hhi<-as.numeric(res[,"HHI spp"])
\end{CodeInput}
\end{CodeChunk}

The maximal acceptable $\textit{{cs}}$ value is calculated as a mid-range of $\mathit{Stress}\text{{}-}1$ values.

\begin{CodeChunk}
\begin{CodeInput}
R> cs<-(min(stress)+max(stress))/2 
R> print(cs)
[1] 0.146501
\end{CodeInput}
\end{CodeChunk}

Then the best MDS procedure from all combinations is chosen.

\begin{CodeChunk}
\begin{CodeInput}
# Elements of optimal MDS procedure
R> t<-findOptimalSmacofSym(res,cs)
R> print(t)
\end{CodeInput}
\begin{CodeOutput}
\$`Nr`
[1] 117
\$Normalization_method
[1] "n12a"
\$MDS_model
[1] "interval"
\$Spline_degree
[1] ""
\$Distance_measure
[1] "euclidean"
\$STRESS_1
[1] 0.132176
\$HHI_spp
[1] 420.74
\end{CodeOutput}
\end{CodeChunk}

In next step we can plot dependency between  $\mathit{Stress}\text{{}-}1$ and  $\mathit{HHI}$ index (see Figure \ref{figure:1}) with
best solution marked by red circle and finally we choose the MDS solution that satisfies condition 
$\mathit{Stress}\text{{}-}1{\leq}\mathit{cs}$ and minimizes  $\mathit{HHI}$.

\begin{CodeChunk}
\begin{CodeInput}
# Plot dependency between Stress-1 and HHI index
R> plot(stress[-t$Nr],hhi[-t$Nr],xlab="Stress-1",ylab="HHI",
+ type="n",font.lab=3)
R> text(stress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr])
R> abline(v=cs,col="red")
R> points(stress[t$Nr],hhi[t$Nr],cex=5,col="red")
R> text(stress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red")
\end{CodeInput}
\end{CodeChunk}


\begin{figure}[htb]
  \centering
  \includegraphics[width=9.791cm,height=8.662cm]{Fig1.pdf}
  \caption{T{he values of } $\mathit{Stress}\text{{}-}1$ {fit measure }and $\mathit{HHI}$ index\newline for \textit{p} multidimensional scaling procedures (with best solution marked by red circle)}
  \label{figure:1}
\end{figure}

The results of optimal multidimensional scaling procedure (117), via below script, for 31 objects (29 Lower Silesian
counties, Pattern and Anti-pattern object) according to the level of tourist attractiveness are presented on Figure \ref{figure:2}.

\begin{CodeChunk}
\begin{CodeInput}
R> library(mdsOpt)
R> data(data_lower_silesian)
R> z<-data.Normalization(data_lower_silesian,type="n12a")
R> d<-dist(z,method="euclidean")
R> res<-smacofSym(delta=d,ndim=2,type="interval")
R> par(mfrow=c(2,2),pty="s")
# Shepard Diagram
R> plot(res,plot.type="Shepard",cex.main=0.8,cex.lab=0.8,cex.axis=0.8,cex=0.2)
# Stress plot
R> spp<-sort(res$spp,decreasing=TRUE)
R> names(spp)<-order(res$spp,decreasing=TRUE)
R> plot(spp,main="Stress plot",ylab="Stress contribution in percents",
+ xlab="Objects",ylim=c(-2,30),cex=0.4,cex.main=0.8,cex.lab=0.8,cex.axis=0.8)
R> text(spp,pos=3,names(spp),cex=0.4)
# Configuration plot with bubble
R> bubsize=res$spp/length(spp)*4
R> plot(res$conf,main="Configuration plot with bubble",xlab="Dimension 1",
+ ylab="Dimension 2",cex=bubsize,cex.main=0.8,cex.lab=0.8,cex.axis=0.8,asp=1)
R> text(res$conf[,1],res$conf[,2],pos=3,1:nrow(res$conf),cex=0.7)
R> arrows(res$conf[nrow(z),1],res$conf[nrow(z),2],res$conf[nrow(z)-1,1],
+ res$conf[nrow(z)-1,2],length=0.05,col="black")
R> plot.new()
R> legend("center",paste(1:nrow(res$conf),rownames(res$conf)),
+ bty="n",cex=0.7,ncol=2,title="Legend")
\end{CodeInput}
\end{CodeChunk}

\begin{figure}[htb]
  \centering
    \includegraphics[width=12.7cm,height=12.7cm]{Fig2.pdf}
  \caption{The results of multidimensional scaling (procedure 117) of 31 objects\newline
(29 Lower Silesian counties, Pattern and Anti-pattern) according to the level of tourist attractiveness}
  \label{figure:2}
\end{figure}

Figure \ref{figure:2} (Configuration plot with bubble) presents additional the quota of each object in total error is shown by the size of radius of the circle around each object. Shepard Diagram and Stress plot confirm the
correctness of the chosen scaling model. On Figure \ref{figure:2} {(Configuration plot with bubble)}, the axis of
the set, which means the shortest connection between Pattern and Anti-pattern object, is designated. It indicates the
level of development of the tourist attractiveness of counties. Objects that are closer to Pattern object have higher
level of tourist attractiveness.


To opposite to the best MDS procedure (117) we show the results for the one of the worst procedures (13): n9a
normalization method, mspline of third degree MDS model, maximum (Chebyshev) distance. {In
relation to the previous script, changes in lines 3-5 and in Shepard diagram are required.}

\begin{CodeChunk}
\begin{CodeInput}
R> z<-data.Normalization(data_lower_silesian,type="n9a")
R> d<-dist(z,method="maximum")
R> res<-smacofSym(delta=d,ndim=2,type="mspline",spline.degree=3)
...
# Shepard Diagram
R> plot(res,plot.type="Shepard",cex.main=0.8,cex.lab=0.8,cex.axis=0.8,cex=0.2)
R> t1<-as.matrix(res$delta)
R> t2<-as.matrix(res$confdist)
R> text(t1[7,3],t2[7,3],pos=4,"7,3",cex=0.6)
R> text(t1[31,3],t2[31,3],pos=1,"31,3",cex=0.6)
\end{CodeInput}
\end{CodeChunk}

The results of multidimensional scaling for procedure 13 according to the level of tourist attractiveness are presented on Figure
\ref{figure:3}.

\begin{figure}[htb]
  \centering
  \includegraphics[width=12.7cm,height=12.7cm]{Fig3.pdf}
  \caption{The results of multidimensional scaling (procedure 13) of 31 objects\newline
(29 Lower Silesian counties, Pattern and Anti-pattern) according to the level of tourist attractiveness
  }
  \label{figure:3}
\end{figure}

Overall Stress for procedure 13 ({0.0381}) is much better than for procedure 117 (0.1322).
Figure \ref{figure:3} (Stress plot) exhibits that objects Jeleniogorski (3), Anti-pattern (31) and Zgorzelecki (7) contribute most to
the overall Stress (56.62\%). It also shows (see Shepard Diagram -- in the lower left-hand corner) that two points
(distance between Jeleniogorski county (3) and Anti-pattern object (31); Jeleniogorski county (3) and Zgorzelecki (7)
county) are outliers. {These outliers contribute over-proportionally
to the total Stress. }MDS configuration (Figure \ref{figure:3} -- Configuration plot with bubble) does not represent all proximities
equally good. Jeleniogorski county (3) is one of the best of Lower Silesian counties according to the level of tourist
attractiveness. In Configuration plot with bubble this county lies near Anti-pattern object (the worst object). The
greater the value of the  $\mathit{HHI}_p$ index, the worse is the effect of multidimensional scaling in terms of
representation real relationships between objects.


\subsection{Interval-valued data (symbolic-to-symbolic approach)}

In second example we will find the optimal solution for symbolic-to-symbolic MDS approach. The dataset 
data\_symbolic\_interval\_polish\_voivodships comes from \pkg{clusterSim} package. For the evaluation of
tourist attractiveness of Polish voivodships (provinces) in the year 2016 a two-stage data collection has been carried
out.

{Step 1. Data on tourist attractiveness were collected for 380 Polish counties for the
following metric variables:}

\noindent {x1 -- beds in hotels per 1,000 inhabitants of a county,}

\noindent {x2 -- number of nights spent daily by resident tourists per 1,000 inhabitants of a county,}

\noindent {x3 -- number of nights spent daily by foreign tourists per 1,000 inhabitants of a county,}

\noindent {x4 -- dust pollution emission in tons per 10 km\textsuperscript{2} of a county area,}

\noindent {x5 -- gas pollution emission in tons per 1 km\textsuperscript{2} of a county area,}

\noindent {x6 -- number of criminal offences, crimes against life and health and property crimes per 1,000
inhabitants of a county,}

\noindent {x7 -- forest cover of the county in \%,}

\noindent {x8 -- participants of mass events per 1,000 inhabitants of a county,}

\noindent {x9 -- number of tourist economy entities (sections: I, N79) registered in the system REGON per 1,000
inhabitants of a county.}

{Three variables x4, x5 i x6 can be treated as destimulants. All other variables are stimulants. }

{Step 2. Data table has been aggregated up to the voivodships with interval-valued data as an
a result. The lower bound of the interval for each variable was obtained by calculating the first quartile based on
data from the counties. In turn, the upper bound of the interval was obtained by calculating the third quartile. In
result dataset contains data about 18 objects (16 voivodships, Pattern and Anti-pattern) described by 9 interval-valued
variables.}

First we load package \pkg{mdsOpt} and dataset (please notice that there is no need to load \pkg{clusterSim} package -- it is
auto-loaded automatically by \pkg{mdsOpt}).

\begin{CodeChunk}
\begin{CodeInput}
R> library(mdsOpt)
R> data("data_symbolic_interval_polish_voivodships")
R> data<-data_symbolic_interval_polish_voivodships
\end{CodeInput}
\end{CodeChunk}

Then set the normalizations methods and {optimization methods} used in selection of optimal MDS procedure.


\begin{CodeChunk}
\begin{CodeInput}
R> metnor<-c("n1","n2","n3","n3a","n4","n5","n5a","n6","n6a","n7","n8","n9",
+ "n9a","n10","n11","n12","n12a","n13")
R> methods<-c("MM","BFGS")
\end{CodeInput}
\end{CodeChunk}

In next step we run  $I\text{{}-}\mathit{scal}$ algorithm for all combinations of normalization methods and
{optimization methods} with default parameters.

\begin{CodeChunk}
\begin{CodeInput}
R> res<-optIscalInterval(x=data,dataType="simple",normalizations=metnor,
+ optMethods=methods,outDec=".",stressDigits=6,HHIDigits=2)
\end{CodeInput}
\begin{CodeOutput}
initial value 568.744280 
iter 100 stress = 33.568175
....
final (iter 1000) stress = 9.392537 
stopped after 1000 iterations 
initial  value 217.126687 
final  value 3.943757 
converged
\end{CodeOutput}
\begin{CodeInput}
R> print(res)
\end{CodeInput}
\begin{CodeOutput}
      Normalization method Opt method I-STRESS   HHI spb  
 [1,] "n9a"                "MM"       "0.000087" " 746.31"
 [2,] "n9a"                "BFGS"     "0.000108" "1156.36"
 [3,] "n2"                 "BFGS"     "0.000200" " 863.13"
 ...
[34,] "n4"                 "MM"       "0.007690" "1316.30"
[35,] "n12"                "MM"       "0.008430" "1148.12"
[36,] "n12a"               "MM"       "0.009668" "1058.55"
\end{CodeOutput}
\end{CodeChunk}


The values are ordered by  $I\text{{}-}\mathit{Stress}$ value.
Then we convert  $I\text{{}-}\mathit{Stress}$ and  $\mathit{HHI}$ values to numeric vectors.

\begin{CodeChunk}
\begin{CodeInput}
R> Istress<-as.numeric(res[,"I-STRESS"])
R> hhi<-as.numeric(res[,"HHI spb"])
\end{CodeInput}
\end{CodeChunk}

The maximal acceptable  $cs\text{{}}$ value is calculated as an median of $I\text{{}-}\mathit{Stress}$ values.

\begin{CodeChunk}
\begin{CodeInput}
R> cs<-median(Istress)
R> print(cs)
\end{CodeInput}
\begin{CodeOutput}
[1] 0.003215
\end{CodeOutput}
\end{CodeChunk}

Then the best MDS procedure from all combinations\textcolor{red}{ } is chosen.

\begin{CodeChunk}
\begin{CodeInput}
# Elements of optimal MDS procedure
R> t<-findOptimalIscalInterval(res,cs)
R> print(t)
\end{CodeInput}
\begin{CodeOutput}
$Nr
[1] 5
$Normalization_method
[1] "n2"
$Opt_method
[1] "MM"
$I_STRESS
[1] 0.000268
$HHI_spb
[1] 743.61
\end{CodeOutput}
\end{CodeChunk}

In next step we can plot dependency between  $I\text{{}-}\mathit{Stress}$ and  $\mathit{HHI}$ index (see Figure \ref{figure:4}) with
best solution marked by red circle and finally we choose the MDS solution that satisfies condition 
$I\text{{}-}\mathit{Stress}{\leq}\mathit{cs}$ and minimizes  $\mathit{HHI}$.

\begin{CodeChunk}
\begin{CodeInput}
# Plot dependency between I-Stress and HHI index
R> plot(Istress[-t$Nr],hhi[-t$Nr], xlab="I-Stress",ylab="HHI",
+ type="n",font.lab=3)
R> text(Istress[-t$Nr],hhi[-t$Nr],labels=(1:nrow(res))[-t$Nr])
R> abline(v=cs,col="red")
R> points(Istress[t$Nr],hhi[t$Nr], cex=5,col="red")
R> text(Istress[t$Nr],hhi[t$Nr],labels=(1:nrow(res))[t$Nr],col="red")
\end{CodeInput}
\end{CodeChunk}

\begin{figure}[htb]
  \centering
    \includegraphics[width=9.791cm,height=8.662cm]{Fig4.pdf}
  \caption{ T{he values of } $I\text{{}-}\mathit{Stress}$ {fit measure }and 
$\mathit{HHI}$ index\newline
for \textit{p} multidimensional scaling procedures (with best solution marked by red circle)
  }
  \label{figure:4}
\end{figure}

Now we will display the results of the best MDS procedure (5). First we need to load \pkg{smds} library.

\begin{CodeChunk}
\begin{CodeInput}
R> library(smds)
\end{CodeInput}
\end{CodeChunk}

The results of optimal multidimensional scaling procedure (5), via below script, for 18 objects (16 voivodships, Pattern
and Anti-pattern object) according to the level of tourist attractiveness are presented on Figure \ref{figure:5}.

\begin{CodeChunk}
\begin{CodeInput}
R> library(mdsOpt)
R> data("data_symbolic_interval_polish_voivodships")
R> data<-data_symbolic_interval_polish_voivodships
R> normalized<-interval_normalization(x=data,dataType="simple",type="n2")
R> x<-normalized$simple[,,1];y<-normalized$simple[,,2]
R> my.idiss<-idistBox(X=(x+y)/2,R=(y-x)/2)
R> cmat<-(my.idiss[2,,]+my.idiss[1,,])/2
R> iniX<-cmdscale(as.dist(cmat),k=2)
R> n=dim(my.idiss)[2]
R> iniR<-matrix(rep(1,n*2),nrow=n,ncol=2)
R> res.box<-IMDS(IDM=my.idiss,p=2,model="box",opt.method="MM",
+ report=1001,ini=list(iniX,iniR))
R> x_l<-res.box$EIDM[1,,];x_u<-res.box$IDM[2,,]  
R> y_l<-res.box$IDM[1,,];y_u<-res.box$EIDM[2,,]
R> spb<-ispb(res.box$EIDM,my.idiss)
R> HHI<-sum(spb^2)
R> par(mfrow=c(2,2),pty="s")
# I-dist diagram
R> plot(x_u,y_u, main="I-dist diagram",
+ ylab="The lower (red) and upper (green)\n configuration distances",
+ xlab="The lower (red) and upper\n (green) dissimilarities",
+ col="green",cex.main=0.8,cex.lab=0.8,cex.axis=0.8,cex=0.5)
R> points(x_l,y_l,col="red",cex=0.5)
# I-Stress plot
R> w<-sort(spb,decreasing=TRUE)
R> names(w)<-order(spb,decreasing=TRUE)
R> plot(w,main="I-Stress plot",xlab="Object",ylab="ispb in percents",
+ ylim=c(-2,25),cex=0.4,cex.main=0.8,cex.lab=0.8,cex.axis=0.8)
R> text(w,pos=3,names(w),cex=0.6)
# Configuration plot
R> x<-(res.box$X-res.box$R);y<-(res.box$X+res.box$R)
R> plot(NULL,xlim=c(min(x[,1]),max(y[,1])),ylim=c(min(x[,2]),max(y[,2])),
+ pch=1,cex=0.4,main="Configuration plot",xlab="Dimension 1",
+ ylab="Dimension 2",cex.main=0.8,cex.lab=0.8,asp=1,cex.axis=0.8)
R> rect(x[,1],x[,2],y[,1],y[,2])
R> text(res.box$X[,1],res.box$X[,2],labels=1:18,cex=0.8)
R> plot.new()
R> legend("center",legend=paste(1:dim(data)[[1]],attr(data,"row.names")),
+ bty="n",ncol=2,cex=0.65,title="Legend")
\end{CodeInput}
\end{CodeChunk}

\begin{figure}[htb]
  \centering
  \includegraphics[width=12.7cm,height=12.7cm]{Fig5.pdf}
  \caption{ The results of multidimensional scaling (procedure 5) of 18 objects\newline
(16 voivodships, Pattern and Anti-pattern) according to the level of tourist attractiveness
  }
  \label{figure:5}
\end{figure}

{Figure \ref{figure:5} (}$I\text{{}-}\mathit{dist}$ {diagram and } $I\text{{}-}\mathit{Stress}$
plot) confirms the correctness of the MDS results (Configuration plot). Objects that are closer to pattern of
development have higher level of tourist attractiveness.


To opposite to the best MDS procedure (5) we show, via below script, the results for the one of the worst procedures
(12) according to  $\mathit{HHI}$ index. {In relation to the previous script, changes in lines
5, 13-14 and } $I\text{{}-dist}${ diagram are required.}

\begin{CodeChunk}
\begin{CodeInput}
R> normalized<-interval_normalization(x=data,dataType="simple",type="n5a")
...
R> res.box<-IMDS(IDM=my.idiss,p=2,model="box",opt.method="BFGS",
+ report=1001,ini=list(iniX,iniR))
...
# I-dist diagram
R> plot(x_u,y_u, main="I-dist diagram",
+ ylab="The lower (red) and upper (green)\n configuration distances",
+ xlab="The lower (red) and upper\n (green) dissimilarities",col="green",
+ cex.main=0.8,cex.lab=0.8,cex.axis=0.8,cex=0.5)
R> points(x_l,y_l,col="red",cex=0.5)
R> text(x_u[17,16],y_u[17,16],pos=2,"17,16",cex=0.6)
R> text(x_u[17,4],y_u[17,4],pos=1,"17,4",cex=0.6)
R> text(x_l[16,9],y_l[16,9],pos=3,"16,9",cex=0.6)
\end{CodeInput}
\end{CodeChunk}

The results of multidimensional scaling for procedure 12 according to the level of tourist attractiveness are presented
on Figure \ref{figure:6}.

\begin{figure}[htb]
  \centering
  \includegraphics[width=12.7cm,height=12.7cm]{Fig6.pdf}
  \caption{ The results of multidimensional scaling (procedure 5) of 18 objects\newline
(16 voivodships, Pattern and Anti-pattern) according to the level of tourist attractiveness
  }
  \label{figure:6}
\end{figure}

Figure \ref{figure:6} ($I\text{{}-}\mathit{Stress}$ plot) exhibits that objects Lubuskie (4), Pattern (17) and Zachodniopomorskie (16)
contribute most to the overall  $I\text{{}-}\mathit{Stress}$ (57,68\%). It also shows (see Figure \ref{figure:6} -- 
$I\text{{}-}\mathit{dist}$ diagram) that some points (upper distances between Zachodniopomorskie (16) voivodship and
Pattern object (17); Pattern object (17) and Lubuskie voivodship (4); lower distance between Zachodniopomorskie (16)
voivodship and Podkarpackie voivodship (9)) are outliers. {These
outliers contribute over-proportionally to the total }
$I\text{{}-}\mathit{Stress}${. }MDS configuration (Figure \ref{figure:6} -- Configuration plot) does not represent all proximities equally good. Zachodniopomorskie (16) is the best of Polish
voivodships according to the level of tourist attractiveness. In Figure \ref{figure:6} (Configuration plot) this voivodship lies further from Pattern object than Lubuskie (4). The greater the value of the  $\mathit{HHI}_p$ index, the worse is the
effect of multidimensional scaling in terms of representation real relationships between objects.

\section{Summary}

The article proposes a methodology that allows the selection of the optimal MDS procedure for classical metric and
interval-valued data. For classic-to-classic approach we choose best MDS procedure due to the used methods of
normalization, distance measures and scaling models carried out on the basis of the metric data matrix. On the basis of
this methodology research results are illustrated by first example to find the optimal procedure for multidimensional
scaling of set of objects representing 29 counties in Lower Silesia according to the level of tourist attractiveness.


For symbolic-to-classic approach we choose the best MDS procedure due to the used methods of normalization, distance
measures for interval-valued data and scaling models carried out on the basis of the interval-valued data table. 


For symbolic-to-symbolic approach we choose the best MDS procedure due to the used methods of normalization and
optimization methods carried out on the basis of the interval-valued data table. On the basis of this methodology
research results are illustrated by second example to find the optimal procedure for multidimensional scaling of set of
objects representing 16 Polish voivodships according to the level of tourist attractiveness.


To solve the problem of choosing the optimal multidimensional scaling procedure two criteria were applied in \pkg{mdsOpt}
package Kruskal's  $\mathit{Stress}\text{{}-}1$ fit measure and the Hirschman-Herfindahl  $\mathit{HHI}$ index (in
classic-to-classic and symbolic-to-classic approaches) and  $I\text{{}-}\mathit{Stress}$ fit measure and the
Hirschman-Herfindahl  $\mathit{HHI}$\textit{ }index (in symbolic-to-symbolic approach).


In step 6 the maximal acceptable value of fit measures  $\mathit{Stress}\text{{}-}1$ and  $I\text{{}-}\mathit{Stress}$
has been arbitrary assumed. It is not determined how much error distribution for each object may deviate from the
uniform distribution. Among the procedures of multidimensional scaling for which  $\mathit{Stress}\text{{}-}1{\leq}cs$ ($I\text{{}-}\mathit{Stress}{\leq}cs$) the one for which occurs  $\underset p{\mathit{min}}\left\{\mathit{HHI}_p\right\}$ is
selected. This constraint does not essentially limit the presented proposal, as additional criteria for acceptability
such as Shepard diagram (\cite{Leeuw+Mair:2015}) and Stress plot or  $I\text{{}-}\mathit{dist}$ diagram and 
$I\text{{}-}\mathit{Stress}$ plot confirm the correctness of the MDS results.
\addcontentsline{toc}{chapter}{References}
\bibliography{\jobname}
\end{document}


