Skip to main content

KMeans

KMeans [/CAN /DEAD=method /DIST=mode /INIT=method /INW=iWave /NCLS=num /OUT=format /SEED=val /TER=method /TERN=num /Z] populationWave

KMeans analyzes the clustering of data in populationWave using an iterative algorithm. The result of KMeans is a listing of classes that is saved by default in the wave M_KMClasses in the current data folder. Optional results include the distribution of class members (W_KMMembers), the inter-class distances (M_KMCDistances) and class dispersion (W_KMDispersion). populationWave is a 2D wave in which columns correspond to members of the population and rows contain dimensional information.

Flags

/CANAnalyzes the clustering by computing distances between the means of the resulting classes. The distances are stored in the wave M_KMCDistances in the current data folder or in the wave specified by /DSTS. The wave contains an NxN square matrix where N is the number of classes. Self distances (along the diagonal) or distances involving classes that did not survive the iterations are filled with NaN. The operation also saves the wave W_KMDispersion, which contains the sum of the distances between the center of each class and all its members. Distances are evaluated using the method specified by /DIST.
/DEAD=methodSpecifies how the algorithm should handle "dead" classes, i.e., classes that lose all members during a given iteration.
method =1:Remove the class if it loses all members.
method =2:Keep the last value of the mean vector in case the class might get new members in a subsequent iteration. This is the default method.
method =3:Assign the class a random mean vector.
/DIST=modeSpecifies how the class distances are evaluated.
mode =1:Distance is evaluated as the sum of the absolute values (also known as Manhattan distance).
mode =2: ::Distance is evaluated as Euclidian distance. This is the default
/DSTC=dstCWave
Specifies the output classes information wave. If you do not use this flag, the output is saved in the wave M_KMClasses in the current data folder.
It is an error to specify the same wave as both destination and source wave.
When used in a function, the operation creates a real wave reference for the dstCWave. See Automatic Creation of Wave References for details.
This flag was added in Igor Pro 10.00.
/DSTD=dstDWave
Specifies the output dispersion wave. If you do not use this flag, the operation saves the dispersion data in the wave W_KMDispersion in the current data folder. See /CAN above for more details.
It is an error to specify the same wave as both destination and source wave.
When used in a function, the operation creates a real wave reference for the dstDWave. See Automatic Creation of Wave References for details.
This flag was added in Igor Pro 10.00.
/DSTM=dstMWave
Specifies the output classes membership wave. If you do not use this flag, the operation saves this information in the wave W_KMMembers in the current data folder. See /OUT=2 below for more information.
It is an error to specify the same wave as both destination and source wave.
When used in a function, the operation creates a real wave reference for the dstMWave. See Automatic Creation of Wave References for details.
This flag was added in Igor Pro 10.00.
/DSTS=dstSWave
Specifies the wave containing the distance matrix described in /CAN above (which is saved by default as M_KMCDistances in the current data folder).
It is an error to specify the same wave as both destination and source wave.
When used in a function, the operation creates a real wave reference for the dstSWave. See Automatic Creation of Wave References for details.
This flag was added in Igor Pro 10.00.
/FREECreates all the specified destination waves as free waves.
/FREE is allowed only in functions and only if destWave is a simple name or wave reference structure field.
See Free Waves for more discussion.
The /FREE flag was added in Igor Pro 10.00.
/INIT=methodSpecifies the initialization method.
method =1:Random assignment of members of the population to a class
method =2:User specified mean values (/INW).
method =3:Initialize classes using values of a random selection from the population. This is the default initialization method.
/INW=iWaveSpecifies the initial classes. The number of rows of iWave equals the dimensionality of the class, and the number of columns of iWave is the number of classes. For example, if we want to initialize 5 classes in a problem that involves position in two dimensions, then iWave must have 2 rows and 5 columns. The number of rows must also match the number of rows in populationWave.
/NCLS=numSets the number of classes in the data. If the initialization method uses specific means (/INIT=2), then the number of columns of iWave (see /INW) must match num. The default number of classes is 2.
/OUT=formatSpecifies the format for the results.
format=1:Default; output only the specification of the classes in the 2D wave M_KMClasses. Each column in M_KMClasses represents a class. The number of rows in M_KMClasses is equal to the number of rows in populationWave +1. The last row contains the number of class members. The remaining rows represent the center of the class. For example, if populationWave has two rows, then the dimensionality of the problem is 2 and M_KMClasses has 3 rows with the first row containing the first components of each class center, the second row containing the second components of each class center and the third row containing the number of elements in each class.
format=2:Output (in addition to M_KMClasses) the class membership in the wave W_KMMembers. The rows in this 1D wave correspond to sequential members of populationWave and the entries correspond to the (zero based) column number in M_KMClasses.
/SEED=valSets the seed for a new sequence in the pseudo-random number generator that is used by the operation. val must be an integer greater than zero.
By changing the sequence you may be able to find new solutions or just make the process converge at a different rate.
/TER=methodDetermines when the iterations stop.
method =1:User-specified number of iterations (/TERN).
method =2:Continue iterating until no more than a fixed number of elements change classes in one iteration (TERN). This is the default termination method.
/TERN=numSpecifies the termination number. The meaning of the number is determined by /TER above. By default, the termination method =2 and the default value of the maximum number of elements that change classes in one iteration is 5% of the size of the population.
/ZNo error reporting. If an error occurs, sets V_flag to -1 but does not halt function execution.

Details

KMeans uses an iterative algorithm to analyze the clustering of data. The algorithm is not guaranteed to find a global optimum (maximum likelihood solution), so the operation provides various flags to control when the iterations terminate. You can determine if the operation iterates a fixed number of times or loops until at most a specified maximum number of elements change class membership in a single iteration. If you are computing KMeans in more than one dimension, you should pay attention to the relative magnitudes of the data in each dimension. For example, if your data is distributed on the interval [0,1] in the first dimension and on the interval [0,1e7] in the second dimension, the operation will be biased by the much larger magnitude of values in the second dimension.

Examples

Create data with 3 classes:

Make/O/N=(1,128) jack=4+gnoise(1)
jack[0][15,50]+=10
jack[0][60,]+=20

Perform KMeans looking for 5 classes:

KMeans/init=1/out=1/ter=1/dead=1/tern=1000/ncls=5 jack
Print M_KMClasses
M_KMClasses[0][0]= {24.1439,68}
M_KMClasses[0][1]= {14.1026,36}
M_KMClasses[0][2]= {4.01537,24}

See Also

FPClustering

Demos

Open Clustering Demo