Delete the duplicates
In our example, we wish to begin a new phase of e-mailing. However, it is possible that our set of data contains many times the same email adress which engenders many sendings to the same person. To avoid this, we will delete all the duplicates of the variable email to make sure that our set of data contains each email just once, and therefore each person is consulted once.
In case of two duplicates, it is possible either to conserve the first response (Mini), or the last (Maxi) while deleting the duplicates. When you click ok, this generates a new file .sphx in order not to overwrite your present data. In case where there exists at least three duplicates (the same email, for example, is present three times in the database), so it is necessary to delete all the observations mentioned like « Inter » and to delete the observation « Mini » or « Maxi ».
Identify the duplicates
- 1 Select the variable which you wish to identify and/or delete the possible duplicates, (you can also select several variables : for example « email » and « name » if we consider that many people could have the same email adress.
- 2 Choose the type of action to do, Identify duplicates or Delete duplicates.
- 3 Determine the position of the duplicates found.
If you choose to determine the position of duplicates in relation to the order of saving of abservations, a variable IDEM will be created, in which appears the number of the first identic observation in relation to the variable selected. A second variable POSITION indicates « the state » of each observation. This variable has four possible values:
- Unique : means that the observation hasn't any duplicate
- Mini : means that the observation is the first observation in a list de of duplicates
- Inter : the observation is an intermediate duplicate of the list of duplicates
- Maxi : the observation is the last observation of a list of duplicates
Note that you can list the duplicates either by the order of savingof observations (for three duplicates, the observation which will contain the modality MINI will be then the one being saved first) or by the value of a variable to select, for example the variable CLE, therefore the duplicates will be classified according to the value of this variable.