Chapter Two: LITERATURE REVIEW
We employ cluster analysis to turn to our research inquiry as it is a technique that allows users to place natural underlying group construction of complex datasets. In making so, we can place the type of houses and the degree of house quality that a peculiar group of managers is associated with. Specifically, we expect that managers sitting on high and low quality houses will be grouped together severally. Figure 4.1 shows a flow chart exemplifying the methodological analysis employed in this survey. After bunch analysis, we perform regression analysis. Using the relation found in arrested development analysis, we compute the predicted figure of directorships for all managers included in our analysis. Finally, we do a comparing of the existent and predicted figure of directorships for managers in each group obtained from bunch analysis earlier. Such a comparing allows us to straight turn to our hypotheses developed in Chapter Three.
1.1.1. Bunch Analysis
Cluster Analysis is an unsupervised acquisition technique, which allows users to research complex datasets, through the designation of natural group structures underlying the information ( Everitt, 1993 ; Jain et al. , 1999 ; Duda et al. , 2001 ; Hastie et al. , 2001 ) . In kernel, bunch analysis dividers objects into assorted groups for which similarity between objects in the same group, and unsimilarity between groups are maximized ( Hand et al. , 2001 ) . These objects are represented by multidimensional variables, and the similarity or unsimilarity between two objects are measured as the distance between the multidimensional variables vectors that represent the objects.
There are many ways of executing bunch analysis, and it is extremely unlikely that two different algorithms chosen will take to wholly indistinguishable dividers of informations. Further, each algorithm is associated with possible defects ; no one peculiar algorithm can be considered the best. Hence, instead than taking a peculiar algorithm to execute our bunch analysis and following Shen et Al. ( 2007 ) , we perform cluster analysis by using three conceptually different algorithms – Hierarchical Clustering, K Means bunch and Expected Maximization for Gaussian Mixture Model, and later employ consensus constellating to obtain a individual consensus solution. This methodological analysis allows us to avoid the possible defects of assorted constellating methods, bettering the dependability of our bunchs solutions. Figure 4.2 shows a flow chart exemplifying the stairss we take to execute bunch analysis.
There are two chief methods of constellating managers in our survey ; either by single manager features or by single house features. Firm features can distinguish between managers as they define the type of house a manager is associated with. For case, two managers will look to be different when one sits on a house with big market capitalisation while the other sits on a little market capitalisation house. In our survey, we attempt to constellate managers by the latter method as it is sensible to anticipate that single manager features ( e.g. old experience ) have been considered by companies themselves when they invite those managers to fall in their boards. In this manner, our bunch analysis besides takes in consideration single manager features. In specifying bunch variables, we follow a deductive attack in which the suitableness of variables is theorybased ( Ketchen et al. , 1993 ) . This is frequently considered a better attack to steer variable picks as irrelevant variables may impact the cogency of a bunchs solution ( Punj and Stewart, 1983 ) . We employ the seven house features antecedently discussed in Chapter Three as our bunch variables.
As bunch analysis groups objects by maximising distance between groups along all bunch variables, variables with larger scopes tend to rule ( i.e. given more weight ) the bunchs solution in comparing to those variables with smaller scopes ( Hair et al. , 1992 ) . Sing that the scope of house size ( measured by market capitalisation ) is several times larger than the scope of ownership construction variables, house size variables are likely to rule our bunchs solution. Although some surveies claim that standardisation of variables to every bit scaled informations is non necessary ( Dolnicar, 2002 ) and may falsify consequences, we choose to standardise the variables by Zscores to avoid bunchs solutions that are dominated by market capitalisation.
1.1.1.1. Bunch Analysis Algorithms, Cluster Validation and Consensus Clustering
In the undermentioned subdivisions, we describe the three algorithms employed to execute bunch analysis. Subsequently, we discuss the bunch proofs steps taken and illustrate the usage of consensus constellating to happen a individual consensus solution. We perform the undermentioned stairss on three sets of informations ; all managers, outside managers and independent managers as extant research have found that dissimilar behaviors between interior and outside managers can take to contrasting consequences in surveies ( Mak et al. , 2003 and Fich and Shivdasani, 2006 ) .
1.1.1.1.1. Bunch Analysis Algorithms
Hierarchical bunch
Hierarchical bunch can be either agglomerate or dissentious. Dissentious methods begin with every object in one bunch and terminal when each object is in single bunchs. Dissentious methods are non readily available and are therefore seldom cited. In this thesis, we employ the more normally used agglomerate method and discourse the basic procedure below.
The basic procedure of an agglomerate hierarchal bunch for N objects is ( Johnson, 1967 ) :
 Assign each object to its ain bunch, therefore the method begins with N bunchs.
 The closest ( most similar ) brace of bunchs are merged into a individual bunch, therefore there are now ( N1 ) bunchs.
 Recompute distances ( similarities ) between the new bunch and the old bunchs separately.
 Iterate through stairss 2 and 3 until there is merely a individual bunch left.
In measure 2, several methods to calculate distances exist, and they are referred to as linkage map. For case, individual linkage constellating computes similarity of two bunchs as the similarity of their most similar members. Complete linkage constellating steps the similarity of two bunchs as the similarity of their most dissimilar members. In the analysis, we choose Ward ‘s method ( Ward, 1963 ) , a method that is distinguishable from the aforesaid methods, as our linkage map. Ward ‘s method ( Ward, 1963 ) chooses each consecutive meeting measure through the standard of minimising the addition in the mistake amount of squares ( ESS ) at each measure. The ESS of a set Ten of NX values is given by the functional relation,
where: . is the absolute value of a scalar value or the norm ( the “ length ” ) of a vector.
The linkage map mentioning to the distance between bunchs X and Y is given by,
where: Xy is the combined bunch ensuing from merged bunchs X and Y ;
ESS ( . ) is the mistake amount of squares described above.
In addon to a linkage map, a metric for mensurating distance between two objects is required. In our survey, the Squared Euclidean Distance ( SED ) is chosen as our metric for distance step for both Hierarchical Clustering and K Means bunch. If two objects, x1 and x2 in the Euclidean nspace is given by x1 = ( x1i, x1i, … , x1n ) and x2 = ( x2i, x2i, … , x2n ) , so the SED between these two objects is given by,
While Agglomerative Hierarchical Clustering ( AHC ) does non necessitate user to stipulate the figure of bunchs, K, a priori, a drawback of AHC is that it neglects the phenomenon of input order instability. In stairss 2 and 3 of an AHC, a job arises when two braces of bunchs are both deliberate to hold the smallest distance value. “In such instances arbitrary [ italics added ] determinations must be made” ( Sneath & A ; Sokal, 1973 ) to take the brace of bunchs that will be merged. These arbitrary determinations extend to computing machine plans ( Spaans & A ; Van der Kloot, 2005 ) and as a consequence, different input orders of objects in the propinquity matrix can ensue in significantly different bunchs solutions ( Van der Kloot et al. , 2005 ) . To avoid this booby trap, we employ PermuCLUSTER for SPSS ( Van der Kloot et al. , 2005 ) . This plan repeats AHC for a user specified figure of times by commuting the rows and columns of the propinquity matrix. Thereafter, it evaluates the quality of each AHC solution by utilizing a goodnessoffit step ( SSDIFN ) given by,
where: dij is the distances of the objects in the original propinquity matrix
cij is the distances of the objects in the AHC tree
In our analysis, we foremost employ PermuCLUSTER where the figure of AHC repeats is set at 500, and measure the end point optimum solutions obtained. Thereafter, we validate the bunchs solutions ( k = 2 to 35 ) of the optimum solution.
K Means constellating
In K Means bunch, K refers to the figure of bunchs, though unknown a priori, has to be specified by the user. There is a centroid in each bunch, normally computed as the mean of the variable vectors in that bunch. Clustering is decided based on the association to the nearest centroid.
The basic procedure of the K Means bunch ( MacQueen, 1967 ) is:
 Determine initial centroids.
 Find the closest centroid to each object and delegate the object to the bunch associated with this centroid.
 Recalculate the centroid for each of new bunchs based on new bunch ranks.
 Iterate through stairss 2 and 3 until convergence.
The algorithm converges when bunch rank of information points remain unchanged. In this state of affairs, other widelyused conditions such as centroid computed and amount of squared distances from informations points to their centroids stay changeless.
K Means constellating iteratively shifts objects to assorted bunchs, seeking to minimise the amount of squared distances, denoted by J, between each object and its bunch centroid. The amount of squared distances, Ji, for the ith bunch, denoted as Ci is given by,
where: is the Squared Euclidean Distance from object ten in Ci to its centroid Lolo
The amount of squared distances of all the K bunchs is given by,
In measure 1, different sets of initial centroids can finally ensue in different local lower limit of J. However, we would wish to happen the bunchs solutions that can ensue in the planetary lower limit. The best methodological analysis involves using all sets of initial centroids in the analysis, but it is expensive and therefore non feasible. As an alternate, we repeat K Means bunch ( for K = 2 to 35 ) 500 times with 500 random sets of initial centroids, to happen bunchs solutions that are either planetary lower limit or at least local lower limit that is the closest to the planetary lower limit among the assorted local lower limit.
Expectation Maximization for Gaussian Mixture Model
In the Gaussian Mixture Model ( GMM ) , Expectation maximization ( EM ) algorithm seeks to happen the maximal likeliness estimations for mixture theoretical accounts when the theoretical account is dependent on unknown latent variables.
The chief stairss of the EM method are ( Dempster et al. , 1977 ) :
 Calculate the parametric quantities ( average and discrepancy ) for the K Gaussian distributions.
 Using the chance denseness map of Gaussian distribution, calculate the chance denseness for each characteristic vector in each of the K bunchs
 With the chance densenesss calculated in measure 2, recompute the parametric quantities for each of the K Gaussian distributions
 Repeat Steps 2 and 3 until convergence.
We perform EM constellating utilizing MIXMOD for Matlab ( Biernacki et al. , 2006 ) and the statistical certification are as follows. Clustering utilizing mixture theoretical accounts typically dividers x objects into K bunchs denoted by labels, with and depending on whether eleven is assigned to kth bunch or non. In a mixture theoretical account where n independent vectors of a dataset are represented by x = { x1, … , xn } , each eleven arises from a chance distribution with denseness,
where: pk is the commixture proportions ( 0 & lt ; pk & lt ; 1 for all K = 1, … , K and p1 +…+pK = 1 )
H ( .?k ) is the ddimensional distribution parameterized by ?k.
As such, we can demo how each eleven arises from a chance distribution with denseness in a GMM by replacing ?k with its associated ddimensional Gaussian denseness with average µk and discrepancy matrix ?k,
where:
? = ( p1… , pK, ?1, … , ?K, ?1, … , ?K ) is the vector of the mixture parametric quantities
Bunchs can be derived from the maximal likeliness estimations of the mixture parametric quantities ? obtained by utilizing the Expectation Maximization ( EM ) algorithm. The maximal likelihood appraisal of the GMM is given by,
Each twelve assigned to the bunch that provides the largest conditional chance that xi arises from it utilizing the Maximum A posteriori ( MAP ) rule.
The MAP rule is given by,
where:
EM algorithm seeks to happen maximal likeliness estimations through loop of the outlook and maximization stairss.
The mth loop of the Expectation measure is given by,
and the maximal likelihood estimation of the mth loop ( denoted by ?m ) is updated utilizing the conditional chances as conditional commixture weights. This leads to the Maximization measure as given by,
where:
As in K Means bunch, different initial values of the parametric quantities may take to different local upper limit of the maximal likeliness estimation map. As such, to guarantee that we can acquire a upper limit that is either the planetary upper limit or a local upper limit that is closest to the planetary upper limit, for each K ( where K = 2 to 35 ) , we repeat the algorithm 500 times utilizing 500 random sets of initial parametric quantities. The optimum solutions are the bunchs solutions which result in the highest maximal likeliness estimation.
1.1.1.1.2. Cluster Validation
As bunch analysis is an unsupervised technique, cluster proof is a necessary measure to measure consequences of bunch analysis in an nonsubjective and quantitative mode.
The chief proof aims are:
 Determination of constellating inclination
 Determination of the figure of bunchs
 Measure how good a bunchs result represent the natural group construction underlying the informations based on information intrinsic to the informations entirely ( i.e. internal proof ) ( Handl, Knowles, & A ; Kell, 2005 ) ;
 Evaluate bunchs consequences based on comparing with known category labels which correspond to the natural group construction underlying the information ( i.e. external proof ) ( Handl, Knowles, & A ; Kell, 2005 )
As constellating techniques are known to happen bunchs even when there is no implicit in bunch construction, nonsubjective 1 is cardinal for bunch analysis. Objective 2 is imperative because the figure of bunchs is an indispensable parametric quantity in two constellating techniques that we employ. To the best of our cognition, this is the first clip a bunch analysis is performed on the market for managers. Hence, there are no constituted category labels that correspond to the natural bunch construction. Thus we will merely transport out an rating based on internal proof steps.
Appraisal of Clustering Tendency
In comparing to other proof stairss, measuring constellating inclination is a measure prior to existent bunch of the information. In our survey, we utilize selforganizing maps ( SOMs ) to measure the bunch inclination of our informations. SOM Toolbox for Matlab ( Vesanto et al. , 1999 ) is employed to execute SOM preparation and visual image. An SOM consists of nerve cells as constituents that are organized on a regular low dimensional grid. Each nerve cell is represented by a weight vector of the same dimensions as the input vectors. Connections between next nerve cells are by a neighbourhood relation, which dictates the topology, or construction, of the map. The SOM preparation algorithm moves the weight vectors around so that the map is organized in a manner whereby nerve cells of similar weight vectors are sorted together.Visualization of SOM is performed through the Umatrix. By visualising distances between neighboring map units, Umatrix allows the creative activity of a bunch inclination map. In a bunch inclination map, high values ( represented by darkcolored hexagons ) of the Umatrix indicate possible bunchs boundary lines while unvarying countries of low values ( represented by lightcoloured hexagons ) show possible bunchs. Figure 4.3 illustrates a high bunch inclination map and a low bunch inclination map.
Internal Validation
We chiefly make usage of internal proof indices to measure the fittingness of a bunchs solution. Fitness steps are associated with the geometrical belongingss of bunchs ( i.e. concentration, separation and connection ) . These belongingss are utilized as most constellating methods normally optimize these belongingss to detect implicit in group construction in the information ( Johnson, 1967 ; Dempster et al. , 1977 ; Kaufman and Rousseeuw, 1990 ; Handl and Knowles, 2006 ) . Use of internal proof indices besides allows us to happen the optimum figure of bunchs ( K ) , indicated by the bunchs solution with the highest quality. For Hierarchical Clustering and K Means bunch, using the plan CVAP ( Wang et al. , 2009 ) , we validate our bunchs solutions with two different indices – Average Silhouette Width and CIndex, to guarantee that our bunch consequences are robust to different proof steps.
Average Silhouette Width
Average Silhouette Width is a composite index which measures both concentration and separation of bunchs ( Kaufman and Rousseeuw, 1990 ) . Silhouette width compares the similarity between an object and other objects in the same bunch with the similarity between the same object and other objects in a neighbour bunch. A neighbour bunch N ( Xi ) to object Xi in bunch C ( Xi ) is defined as the bunch whose objects have the shortest mean distance to object Xi amid all the bunchs beside cluster C. The neighbour bunch N ( Xi ) is given by,
where: Xiis the objects in the dataset
vitamin D ( Xi, Xj ) is the distance between two objects Xi and Xj
The silhouette breadth for Xi, as denoted by Si, is given by,
where: is the mean distance between Xi and the objects in bunch C ( Xi )
is the mean distance between Xi and the objects in neighbour bunch N ( Xi )
Silhouette breadth, Si, ranges from 1 to 1. When Si is near to 1, the constellating solution give good bunchs and that Eleven is likely to be assigned to the appropriate bunch. When Si is near to 0, Xican probably be assigned to another bunch and when Si is close to 1, Xi is likely to be assigned to a incorrect bunch.
Average Silhouette Width ( AS ) is given as,
Therefore, the best bunchs solution associated with the optimum figure of bunchs ( K ) is given by the AS with the largest value.
C Index
C Index ( Hubert and Levin, 1976 ) is a bunch similarity step. The best bunchs solution is identified as the solution that consequences in the lowest value. C Index ( C ) is given by,
where: Second is the amount of pairwise unsimilarities between all braces of objects in the same bunch
If the bunch has n such unsimilarities, so Smin is the amount of the n smallest pairwise unsimilarities
Similarly, Smaxis the amount of the n largest distance for all the braces of form
In CVAP ( Wang et al. , 2009 ) nevertheless, the optimum K is given by the value which consequences in the steepest articulatio genus. Steepest articulatio genus refers the greatest leap of indices value between 2 K.
Bayesian Information Criterion
When utilizing EM algorithm to maximise the likeliness of the parametric quantities, a phenomenon known as overfitting ( Hand et al. , 2001 ) may happen. In this state of affairs, extra parametric quantities are chosen to increase the likeliness obtained therefore ensuing in an overly composite theoretical account which fits the informations excessively closely.
As such, we validate our bunchs solutions with Bayesian Information Criterion ( BIC ) as it is formulated to avoid overfitting.
Typically, one can take a theoretical account that maximizes the incorporate likeliness given by,
where: integrated likeliness is
likeliness is
?m, K is the parameter infinite of the theoretical account m with K bunchs
is a non informative or a decrepit enlightening anterior distribution on ? for this theoretical account
Following that, the asymptotic estimate of the integrated likeliness valid under regularity conditions ( Schwarz, 1978 ) is given as,
where: is the maximal likelihood estimation of ?
vm, K is the figure of free parametric quantities in theoretical account m with K bunchs
Finally, this consequences in the minimisation of BIC standard given by,
where is the maximal loglikelihood for m and K
1.1.1.1.3. Consensus Clustering
Consensus bunch, as its name suggests, is employed to happen a consensus solution that is in understanding with several bunchs solutions obtained through multiple bunch algorithms. As there are associated defects with single bunch algorithms, for case in K Means bunch, one has to stipulate figure of bunchs a priori, consensus bunch, by uniting solutions, can assist extinguish such defects. Further, consensus bunchs solutions are less sensitive to resound, outliers or sample fluctuations ( Nguyen and Caruana, 2008 ) . Shen et Al. ( 2007 ) besides contend that uniting bunchs consequences from multiple methods is more likely to expose the implicit in natural group construction and tendencies present in the dataset. As a consequence, consensus clstering helps to increase the quality and hardiness of bunchs consequences ( Strehl and Ghosh, 2002 ) .
Intuitively, a superior aggregative solution should portion as much information as possible with single bunchs solutions, as bunchs which remain comparatively similar across multiple algorithms tallies are likely to reflect the existent group construction underlying the dataset. Consequently, in our analysis, we make usage of MetaClustering Algorithm ( MCLA ) ( Strehl and Ghosh, 2002 ) that seeks to optimise the mean common information shared between different braces of bunchs. Before exemplifying MCLA, we will foremost explicate the transmutation of a bunch solution into hypergraph representation. Figure 4.4 illustrates the transmutation of a bunch solution into hypergraph representation. Each bunch solution is transformed into a binary rank index matrix Hq with a column for each bunch ( denoted as hyperedge hello ) . For each hello, 1 indicates that the vertex which corresponds to the row belongs to that hyperedge and 0 indicates otherwise.
Cluster solutions ?i 
H1 
H2 
H3 

Directors 
?1 
?2 
?3 
h1 
h2 
h3 
h4 
h5 
h6 

x1 
1 
2 
1 
v1 
1 
0 
0 
1 
1 
0 

x2 
1 
2 
2 
v2 
1 
0 
0 
1 
0 
1 

x3 
2 
1 
2 
v3 
0 
1 
1 
0 
0 
1 

x4 
2 
1 
2 
v4 
0 
1 
1 
0 
0 
1 
MCLA is based on constellating bunchs by first building a metagraph. After transmutation into hypergraph representation, MCLA foremost view all hyperedges as vertices of another regular and adrift graph, the metagraph. Edges are weighted based on the similarity between vertices measured by the Jaccard step. Following, the metagraph is partitioned into kbalanced metaclusters utilizing METIS ( Karypis and Kumar, 1998 ) . Thereafter, hyperedges in each K metaclusters are prostration into individual metahyperedge. Finally, each object is assigned to the metacluster that it is most associated with. The association is computed by averaging all hyperedges of a metacluster. We run MCLA for k = 2 to 10, as the optimum K for our constellating solutions did non travel beyond 5. Intuitively, it is extremely improbable that the optimum K associated with the consensus solution will travel beyond 10.
We besides perform two other consensus constellating algorithms described in Strehl and Ghosh ( 2002 ) . Clusterbased Similarity Partitioning Algorithm ( CSPA ) finds the consensus solution by first making a n ten n binary similarity matrix in which two objects are denoted as 1 if they are in the same bunch, and 0 if otherwise. An induced similarity step is so calculated. Subsequently, the algorithm utilises METIS ( Karypis and Kumar, 1998 ) to recluster the objects based on the similarity step, giving a consensus clusters solution. HyperGraph Partitioning Algorithm ( HGPA ) repartitions the informations by sing original bunchs as indicants of strong bonds. To happen bunchs, the algorithm partitions the hypergraph by dividing the least figure of hyperedges.
1.1.2. Cross – Sectional Arrested developments
To look into whether our sample supports our hypotheses, consistent with Ahn et Al. ( forthcoming ) , we model the figure of directorships as a map of directorial features and associated house features. Consequently, we use age, instruction makings and prior or current employment as placeholders for managers ‘ quality. We besides include term of office to mensurate a manager ‘s committedness to a house. For steadfast quality variables, we include steadfast size, house age, inclusion in the Straits Times Index and administration quality. We besides investigate the relation between block ownership and optimum figure of directorships.
A sum of 18 independent variables are investigated ; half being directorial features and the other half being steadfast features. Directorial features are: age, term of office, laminitis of a listed company, prior or current employment as civil retainer, CEO or Chairman, CFO or COO, spouse of tier 1 accounting or jurisprudence houses or academic, gender and instruction makings. Firm features are: administration quality, house age, block ownership construction defined as household houses, GLCs or foundermanaged houses, inclusion in Straits Times Index, house size and chief state of concern. We control for houses ‘ fiscal and operation stableness measured by its purchase and stock return volatility, industry growing chances and profitableness measured by industry norm market to book ratio and industry mean Return on assets ( ROA ) . The dependent variable comprises of the figure of directorships held by a manager as of Dec 31, 2008.
There are two possible ways of executing the arrested development, one by sing the entire figure of directorships and the other by sing the entire figure of managers. To guarantee that our consequences are robust, we perform arrested developments based on both ways. Furthermore, to guarantee that the consequences are robust to board assignments, both partial and full theoretical accounts are performed on the full sample of all managers, a subsample comprising of outside managers and a subsample consisting of lone independent managers.
The full theoretical account is given as,
Log ( Number of Directorships held ) = 
? + ?1Age + ?2Tenure + ?3FounderDirector + ?4Civil Servant + ?5CEO/Chairmain + ?6Other key places + ?7Partners of Tier 1 accounting/law house + ?8Academic + ?9Female + ?10Degree Holder + ?11GTI Score + ?12Firm Age + ?13Family houses + ?14Government Linked Firms + ?15FounderManaged Firms + ?16STI Dummy+ ?17Firm Size + ?18Main Country of Business + ?19Debt to Total Asset + ?20Return Volatility + ?21Industry Market Price to Book Value + ?22Industry ROA 
As our survey is concerned with the mold of the figure of directorships, Poisson arrested development is of course chosen as it is a arrested development theoretical account employed in patterning count informations. Although negative binomial arrested development can be utilized to pattern count informations, it is typically used when there are marks of overdispersion in Poisson arrested development. Our informations show small marks of overdispersion and therefore, Poisson arrested development is determined to be a suited arrested development theoretical account for our sample.
To turn to the hypotheses developed in Chapter Three, we compute the optimum figure of directorships utilizing estimated coefficients obtained from the arrested developments consequences and compare them with the existent figure of directorships held for managers in assorted bunchs.
Outside managers refer to nonindependent and independent nonexecutive managers