Sphinx IQ

Qualify

Qualifying the data allows to identify/delete the awful values. To get access to this assistant, Proceed as follows:

  • In the tab Data, click Qualify

You reach the dialog box below which allows you to set the global quality of your set of data through a four-option choice.

qualify

  • Identify singular or poorly documented observations : Allows, according to the variables selected, to identify the poorly documented observations , the observations with the rare or distinct values of the average, or the scales of which are documented systematically.
  • Replace non-responses : To replace the non-responses (empty cells) either by the average of observations, or by the value of the previous observation.
  • Identify irrelevant variables : Allows to identify the poorly documented variables  or having too little variance.
  • Extract a qualified sample : To create a sub-set of relevant variables  (this sub-set ignores the poorly documented variables ) and a sample taking into account only the observations sufficiently documented.

Adjust

This tool allows to define quotas to adjust the sample and ensure the good representativeness. For this, proceed as follows :

  • From the tab  Data, click Adjust

You reach the following dialog box :

define quota

In our example, we have selected the variable relative to the country (Country_name) as we wish our total sample  to contain  20% of the French respondents, 25% of German respondents, etc.

It is also possible to set  an adjustment on two variables (for example « Gender » and « Age »).

Combine variables

This assistant allows to create a new variable from a combination of several variables chosen. To get access to this assistant, proceed as follows :

  • From the tab Data, click Combine

You reach a dialog box where you can select a type of combination of variables among the first four propositions :

combine

  • Create a variable from a sub-sample : allows you to create a closed variable which modalities correspond to conditions fixed by the user (sub-sample or profile). Therefore the already defined sub-samples appear. The new closed variable is defined by the modalities selected from the sub-samples, if necessary we create a new profile to create another modality.
  • Merge several closed variables : Several types of fusion will then be proposed :
    - Simple fusion : The modalities of the new multiple closed variable are all those of the variables selected.
    - Composed fusion : The modalities of the new multiple closed variable are the different modalities of the variables selected.
    - Cross fusion : The modalities of the new uinque closed variable cross the modalitites of the variables selected.
    - Multiple fusion :The modalities of the new multiple closed variable are the names of the variables selected.
  • Transpose several closed variables : Create a set of variables having the modalities of the variables selected as name.
  • Concatenate the texts of the responses  : Create a text variable which gathers the contents of the responses of the variables selected.

All these combinations are detailed below in the section Operating modes.

Remove duplicates

This operation allows to spot the identic observations on one variable given , and enables to conserve a unique observation based on the order of saving.

Spotting the duplicates creates a variable named POSITION which allows to categorize the observations. It has four possible values : Unique to indicate the observations which are not duplicate, Mini, Maxi and possibly Inter to designate the observations having duplicates.The removal of observations is made by the elimination of the first or the last responses according to the choice of the user.

To remove duplicates, proceed as follows :

  • In the tab Data, click Remove duplicates 

You reach a dialog box proposing two types of actions, identify the duplicates or delete them.

 

Identify duplicates

dedub

1 Select the variable you wish to identify and/or delete the possible duplicates, (you can also select several variables : for exemple « email » and « name » if we consider that several persons could have the same email adress)

2 Choose the type of action to do , Identify duplicates or delete duplicates,

3 Determine the position of the duplicates found.

If you choose to determine the position of duplicates in relation to the order of saving of observations, a variable IDEM will be created, in which appears  the number of the first identic observation in relation to the variable selected. A second variable POSITION indicates « the state » of each observation. This variable has four possible values :

  • Unique : means that the observation hasn't any duplicate
  • Mini : means that the observation is the first observation in a list de of duplicates
  • Inter : the observation is an intermediate duplicate of the list of duplicates
  • Maxi : the observation is the last observation of a list of duplicates

Note that you can list the duplicates either by the order of saving of observations (for three duplicates, the observation which will contain the modality MINI will be then the one being saved first) or by the value of a variable to select, for example the variable CLE, therefore the duplicates will be classified according to the value of this variable.

 

Supprimer les doublons 

dedub 1

In our example, we wish to begin a new phase of e-mailing. However, it is possible that our set of data contains many times the same email adress which engenders many sendings to the same person. To avoid this, we will delete all the duplicates of the variable email to make sure that our set of data contains each email just once, and therefore each person is consulted once.

In case of two duplicates, it is possible either to conserve the first response (Mini), or the last (Maxi) while deleting the duplicates. When you click ok, this generates a new file .sphx in order not to overwrite your present data. In case where there exists at least three duplicates (the same email, for example, is present three times in the database), so it is necessary to delete all the observations mentioned like « Inter » and to delete the observation « Mini » or « Maxi ».

Transform a variable

The transformation of variables proposes many possibilities : managing the modalities of a closed question, creating classes to categorize the numerical questions, regrouping codes or dates, extracting the information contained in the text questions, or changing the type of the variable. To transform a variable, proceed as follows :

  • In the tab Data, click Transform

You reach an assistant which, according to the type of the variable selected, allows you to choose the type of transformation you wish to achieve.

transform 1

The transformation Change the type is the only function selectable for all types of variables. It allows to modify the type of a variable, for example change a unique closed question to a multiple closed question. For the other types of transformation :

  • Manage modalities : For the variables type closed and scale. Allows to modify, arrange, regroup or delete modalities.
  • Group numbers into classes : For the numerical variables. Allows to create classes of intervals. For example, from the numerical variable « Age » you will create the classes « youth » for the under 18 years,(( « Active youth ») for those between 18 and 25 years etc…
  • Regroupe the codes : For the code variables, allows for example for a variable type postal code, to regroup these codes according to the name of their respective region.
  • Extract information from texts : For the text variables, allows to identify the principal themes of a text variable with the help of a dictionary), to create lemmatized variables (each word is replaced by its root ) and to measure the richness, the banality and the length of words of this variable.

The different types of transformations are detailed below, in the section Operating modes.

transform 2

In this example above, we wish to transform a unique closed variable (only one response is possible) to multiple closed (more than one response is possible), we indicate for this the number of possible responses of the transformed variable (6).

More Articles...

  1. Calculate a variable

Subcategories

 

Retour vers : Changing a statistic unit