PROperty FITting : PROFIT

Property-fitting is typically used as a form of interpretation of a configuration, using external information about the stimuli, and mapping each into the configuration as (increasing) directions or vectors . It takes as input both a configuration of stimulus points and a set of rankings or ratings of the same set of stimuli. These rankings and ratings are usually estimates of different properties of the stimuli. The program locates each property as a vector through the configuration of points, so that it indicates the direction over the space in which the property is increasing. The fitting is accomplished by maximising the correlation between the original property values and the projection of the stimuli onto the vector. This correlation may be either linear or non-linear (continuity). PROFIT using the linear option is formally identical to Phase 4 (vector model) of the preference mapping program PREFMAP, also using the linear option.

An internal form of the point-vector model (i.e. where the input configuration is not fixed but is generated from the data) is available in MDPREF.

An option within PARAMAP allows a rectangular or row-conditional (two-way, two mode) array of data to be input for internal analysis using a continuity (kappa) transformation between the data and the solution. But only the stimuli are represented in the solution.

There are two parts to the input data for PROFIT:

The configuration
The configuration consists of the coordinates for a set of objects (stimuli) on a number of dimensions. This may be an a priori configuration, or one resulting from another multidimensional scaling analysis, or, indeed, from a factor analysis. The configuration is input to the program by means of the READ CONFIG command, in free format (or under an associated INPUT FORMAT specification), and may be presented either stimuli (rows) by dimensions (columns) or dimensions (rows) by stimuli (columns). In the latter case the parameter MATFORM should be given the value 1. Since the configuration is not substantially altered by the PROFIT algorithm, analysis can only take place in a given dimensionality and attempts to specify more than one value in the DIMENSIONS command will cause an error.

The properties
Each of the "properties" which PROFIT will seek to represent as vectors in the configuration, is a set of values which distinguish the stimuli on a particular criterion. These may be physical values or subjective evaluations of the stimuli on criteria other than those used to generate the original configuration. For instance, a simple use of the program might be to map into a MINISSA representation of the perceived similarities between a set of stimuli, information about the subjects' preferences of the same stimuli.

Input of properties
Each property consists of a set of values, one for each stimulus in the configuration. Values may be input in free format. (Alternatively, ALL properties must be in the same format, given by an associated INPUT FORMAT specification which precedes the READ MATRIX command which reads the properties.) Each property, however, is preceded by a line containing a label, which appears in the output.

PROFIT seeks to represent the properties as vectors over the configuration of points. The analysis is external in as much as the configuration is regarded as being fixed: the stimulus points cannot be moved to make the fit of the vectors better.

The fitted vector is regarded as indicating the direction in which the given property is increasing. As a theory this implies that preference increases continually, never reaching a maximum (corresponding to the economic concept of insatiability).

The linear procedure:
1. The columns of the configuration are normalised.
2. The XMAT matrix is computed.
    For each property in turn:
3. The direction cosines of the vectors are computed.
4. The projections of the points onto the vectors are computed.
5. The correlation between the projections and the property
    values is computed.
6. The cosines corresponding to the angles between each pair
    of vectors are computed.
7. The configuration and vector-ends are plotted using both
    normalised and original coordinates.

The non-linear procedure:
1. The configuration is normalised.
    For each property:
2. KAPPA and ZSQ measures of alienation and correlation
    respectively are computed.
3. The cosines of the angles between the vectors and the original
    axes are calculated.
4. The projections of the points onto the vectors are calculated.
    When all properties have been treated in this way :
5. The cosine of the angle between each pair of vectors is
    calculated.
6. The configuration of points and vectors is plotted in
    original and normalised co-ordinates.

The use of the WEIGHT parameter
The weighting function plays a crucial role in the definition of KAPPA. This function can take on three different values and each value defines a different "flavour" of KAPPA. The choice of flavour depends crucially on the characteristics of the property values:
WEIGHT (0)
This is the general definition of non-linear correlation and no restrictions are placed on the data. Therefore, this index can always be applied to examine the extent to which the property values (data) and the projections of the stimulus points (solution) are related by a smooth or continuous function.
WEIGHT (1)
In this case, it is assumed that the property values are equally spaced. So the level of measurement of the properties is in effect taken to be ordinal if the order is specified with equal intervals. To do this any equally spaced values may be chosen, such as 1, 2, 3,...N or 5, 10, 15,...5N.
There is no restriction on the characteristics of the stimulus configuration when using this option. This option limits the calculation of KAPPA to adjacent points. In this case, K becomes equivalent to Von Neumann's Eta (the ratio of the mean square successive difference). See below for the use of BCO in conjunction with this option.
WEIGHT (2)
If the property values tend to be highly clustered into two or more groups of values, then PROFIT can be used to determine whether this is also the case for the projections of the stimuli on the fitted vector. To do this we must choose the property values in such a way that it becomes possible to discriminate the clusters. Ordinal level of measurement is sufficient, provided the property values are equally spaced. By defining the maximum distance between two points which are to be taken as falling in the same grouping, the program then selects the clusters. This maximum distance is set using the BCO parameter.
The weight factor will now have the effect of restricting attention to property distances which are close to each other (in effect, in the same grouping) and ignoring values outside the BCO value. In this case, K can be shown to be the equivalent of the "correlation ratio".

The use of the BCO parameter
This parameter has a different use and meaning when used in conjunction with different WEIGHT options:
WEIGHT (0)

In the general case a value of 0 for BCO (the default) will make the weighting function undefined for equal property values. If there are equal property values and BCO the program will terminate. This option in effect assumes that there are no ties between the property values. If ties do occur among your property values then a small value of BCO (say .001) should be used. This will allow calculation of the weight factor even when the property values are equal. A large value for BCO has the effect of allowing Kappa to decrease indefinitely and is not recommended.

WEIGHT (1)

When Von Neumann's Eta is approximated, then the value of the BCO parameter has a more simple explanation than in the previous case. Now BCO simply gives the size of the equal intervals. Note that woth WEIGHT (1), which is the default value, then BCO (0) has no meaning and some other value must be specified.

WEIGHT (2)

In this case the BCO parameter gives the maximum distance allowed between points in the hypothetical clusters described above. Again, in this case, the default value BCO (0) has no meaning, and must be over-ridden by some other value.

INPUT COMMANDS

Keyword                                                        Function
N OF SUBJECTS/ [number]              Number of subjects or "properties"
PROPERTIES
N OF STIMULI     [number]              Number of stimuli in the analysis
DIMENSIONS      [number]              Dimensionality for the analysis (one only)
LABELS             [followed by a        Optionally identify the stimuli, followed by
                        series of labels       the properties. All labels required must be
                        each on a              entered, without omissions.
                        separate line]


PARAMETERS

Keyword        Default                   Function
REGRESSION    1        1: Linear regression will be performed.
                                 2: Non-linear regression will be performed.
                                 3: Both regressions performed (independently).
MATFORM         1       
 1: The configuration is input stimuli
                                  (rows) by dimensions (columns).
                                 2: The configuration is input dimensions
                                  (rows) by stimuli (columns).
WEIGHT           0         (See above for relation to BC0)
                                  0: Carrolls index of continuity.
                                  1: Van Neumann's ration of the mean.
                                  2: The "correlation ratio".
BC0                 0         (See above for relation to WEIGHT)

NOTES
1. N OF PROPERTIES may be used in PROFIT in place of
    N OF SUBJECTS.
2. READ CONFIG is obligatory.
3. LABELS Allows you to add optional labels, on successive lines
    following the command, to identify the stimuli and properties.
4. Since the non-linear option involves calculation of large powers
    of the data values, exponent overflow may occur. In this case
    the data values should be made smaller. This might be done by
    changing the format statement so as to divide the values by, say,
    100.

PRINT options (to main output file)
Option              Form                   Description
INITIAL            p x r          The matrix of stimulus points as
                                        normalised by the program. This will
                                        differ in linear and non-linear approaches.
CORRELATIONS                The following are output:
                                       (Default) 1(a) the correlations for each

                                        property (linear regression).
                                        (b) the eigenroots associated with each
                                        vector (non-linear regression).
PROPERTIES                     The following are output:
                       N x r         1. the direction cosines between each
                                       of the fitted vectors and dimensions
                                       in the normalised space.
                       N x r         2. the direction cosines between each
                                       of the fitted vectors and dimensions
                                       of the original space.
                       N x N         3. the cosines of the angles between
                                        the vectors.
RESIDUALS                       A table of residuals is output
                                        i.e. obtained distances - original distances.

PLOT options (to main output file)
Option                      Description
INITIAL            The stimulus configuration plotted in pairs of
                       dimensions with both original and normalised
                       co-ordinates marked (up to r(r-1)/2 plots).
FINAL               Both stimulus points and property vectors
                       plotted together; original and normalised
                       co-ordinates (up to r(r-1)/2 plots).
SHEPARD         N plots of original property values against
                       projections on fitted vectors giving the
                       shape of the linking function.
RESIDUALS      Histogram of residual values.

By default only the first two dimensions of the joint space are plotted.

PUNCH options (to secondary output file)
Option                       Description
SPSS               This command produces a file containing the
                       following variables:
                       i property
                       j stimulus
                       DATA original value on property i of
                       stimulus j
                       FITTED projection on fitted vector
                       RESID difference between original and
                       fitted values.
SOLUTION        Two matrices are output:
                       i) the matrix of stimulus points as
                       normalised, and
                       ii) the matrix of direction cosines for the
                       fitted vectors.

By default, no secondary output file is produced.

PROGRAM LIMITS
Maximum no. of stimuli = 200
Maximum no. of subjects/properties = 200
Maximum no. of dimensions = 10

See also

  • The NewMDSX commands in full