08 Batch Edit

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Session 8: Batch Edit and Writing PFF Files

At the end of this session participants will be able to:

 Implement consistency checks in a batch edit application


 Use batch edit to add calculated variables and recoded variables
 Write out a .pff file from CSPro logic
 Convert Excel files to CSPro

Creating a Batch Edit Application


A batch edit application is like a data entry application but without the forms. It is meant to be run after
data entry to detect and fix problems in the data file. A batch edit application takes an input data file
and runs logic on it. It generates both a report, called a listing file, and optionally an output data file
which is a modified version of the input file. A batch edit application never changes the input file.

To create a batch edit application you choose FileNew from CSPro and choose Batch Edit Application.
You then choose a dictionary. This is usually the same dictionary that you used for data entry.

The user interface for working with batch applications is similar to the one for working with data entry
applications except that there are no forms. Instead there is a tab for edits. Just like in data entry you
add logic to PROCS. Instead of running interactively, all the error messages are written out to a log file
for review after the whole program has run.

Checking for Errors


To add consistency checks we proceed just as we did in our data entry application by adding logic to the
appropriate PROC. Let’s start with a simple check that the age of first marriage is not greater than the
age.

// Check for age at first marriage less than current age


if B20 > B5 then
errmsg("Age at first marriage greater than age");
endif;

Next we run the application and choose the LesothoCensus2016.dat data file. After the application runs
we see the log file which reports that we have a case where this error exists.

Process Messages

*** Case [0101B0110101011100101101001] has 1 messages (0 E / 0 W / 1U)


U -10 Age at first marriage greater than age

1
User unnumbered messages:

Line Freq Pct. Message text Denom


---- ---- ---- ------------ -----
10 1 - Age at first marriage greater than age -

To figure out what the problem is we can open up the problem case in data entry. The printout in the
listing file contains the case identifiers which we can use to find the case. You can copy the case id from
the listing file and use it with Find Case on the Edit menu in CSEntry.

Correcting Errors
In addition to using batch edit to find errors you can also use it to correct problems by modifying
variables in your logic. Let’s simply cap the age at first marriage to never be greater than the age.

// Check for age at first marriage less than current age


if B20 > B5 then
errmsg("Age at first marriage greater than age. Capping age at first
marriage at age.");
B20 = B5;
endif;

When we run this time we will specify an output file: LesothoCensusEdited.dat. The changes we make
will only be made to the output file. We can then rerun the batch application on the output file and
make sure that you don’t have any error messages.

Instead of just assigning the value ofB20 we can use the impute command which does the assignment
just like “=” but also generates a nice report showing the values that were imputed.

// Check for age at first marriage less than current age


if B20 > B5 then
errmsg("Age at first marriage greater than age. Capping age at first
marriage at age.");
impute(B20, B5);
endif;

The imputation report will be opened in TextViewer after you run the batch application but to see it you
will need to go to the Window menu and choose the file that ends in “.frq.lst”.

IMPUTE FREQUENCIES Page 1

________________________________________________________________________________
Imputed Item B20: Age at first marriage - all occurrences
_____________________________ _____________
Categories Frequency CumFreq % Cum %
_______________________________ _____________________________ _____________

2
40 1 1 100.0 100.0
_______________________________ _____________________________ _____________
TOTAL 1 1 100.0 100.0

Adding Computed Variables


It is often useful to add additional variables to your data file after data collection that are computed
from the collected variables. For example let’s add a yes/no/don’t know variable to the individual record
that determines if the individual is an orphan. First we add the new variable ORPHAN to the dictionary
(at the end so that we do not mess up our existing data). Then we add logic to the PROC of our new
variable to impute the value. We know that a child is an orphan if both parents are deceased.

PROC ORPHAN

// Set calculated variable orphan based on parent line numbers


if B17 = 2 and B18 = 2 then
// Mother and father deceased - is orphan
impute(ORPHAN, 1);
elseif B17 = 1 or B18 = 1 then
// Mother or father alive - not orphan
impute(ORPHAN, 2);
else
// Not enough info to determine.
impute(ORPHAN, 9);
endif;

Run the program and look at the imputation report to see how many orphans are in our data set.

Pre-filling Id Items in the Pff File


We would like to preprogram the enumeration area in for each enumerator by using customized pff
files. To pass the id items to the application from the pff file we add them as parameters. We can either
do this via the edit pff dialog or we can add the following lines to the pff in a text editor:

[Parameters]
DISTRICT=1
CONSTITUENCY=1
COMMUNITY_COUNCIL=B01
ZONE=2
SETTLEMENT=2
ENUMERATION_AREA=1010122018

The parameters that are passed via the pff are available in the application logic by using the function
sysparm(). Note that the fact that the parameter names are the same as the names of the id items does
not automatically set the id-items. We still need to set them in logic as follows:

PROC DISTRICT
preproc

3
// Fill in id-items if they were passed in as pff file parameters
if sysparm("DISTRICT") <> "" then
DISTRICT = tonumber(sysparm("DISTRICT"));
CONSTITUENCY = tonumber(sysparm("CONSTITUENCY"));
COMMUNITY_COUNCIL = sysparm("COMMUNITY_COUNCIL");
ZONE = tonumber(sysparm("ZONE"));
SETTLEMENT = tonumber(sysparm("SETTLEMENT"));
ENUMERATION_AREA = tonumber(sysparm("ENUMERATION_AREA"));

// Protect fields so enumerator cannot modify them


set attributes (DISTRICT, CONSTITUENCY, COMMUNITY_COUNCIL, ZONE,
SETTLEMENT, ENUMERATION_AREA ) protect;
endif;

The sysparm() function always returns an alpha so for numeric values we use tonumber() to convert
them before assigning to numeric dictionary variables. Once we have filled in the values we use set
attributes protect to make the fields protected so that they cannot be modified. We could set these
fields as protected on the form but that would make it impossible to test our application without a pff
file that fills in the id items.

This works well but if we need to create a .pff file for each of 5000+ EAs it will take quite some time. We
have a list of all the EAs so we should be able generate the pff files automatically. In order to do this we
first need to see how to write out a pff file from CSPro logic.

Generating the Pff File from Logic


The pff file is just a text file and it is easy to write out text files using CSPro logic. To write out files from a
CSPro application we have to first declare a file variable in the PROC GLOBAL:

PROC GLOBAL
file pffFile;

A file variable is a new type of variable that represents a file on the disk. We use the commands setfile(),
filewrite() and fileclose() to open, write to and close the file. The easiest way to figure out the logic for
writing out a pff file is to enable the advanced mode in the pff editor and to copy the logic from the
“View CSPro PFF File Creation Logic”. This will generate code to create and run the .pff. We can extract
just the part that writes out the pff file.

Converting Excel files to CSPro


To write a .pff file for each EA we can write a CSPro batch edit application that uses this logic. The batch
application’s input file will be the list of EAs and for each EA we will write out the corresponding pff file.
In order to do that we need to convert our list of EAs from Excel to CSPro. Fortunately there is a tool
called Excel to CSPro which does that.

4
First we need to create a CSPro dictionary to match our EA spreadsheet. Let’s call our dictionary
“LesothoEAs”. We can simply copy and paste the id items from census dictionary that match the
spreadsheet columns into our new dictionary. We don’t need any regular dictionary variables in this
case; just the district, constituency, community council, zone, settlement and enumeration area.

Next we run the CSPro to Excel tool and pick the EA spreadsheet, the dictionary we just created and the
name for the CSPro data file and click convert. In other cases we might need to adjust the worksheet
number, start row or columns but in this case the defaults are fine. This generates a CSPro data file with
the same contents as the Excel spreadsheet.

Writing Pff Files in Batch


Now we can create the batch application to write out the pff files. Let’s name it CreatePffsForEAs.
Choose the LesothoEAs dictionary as the main dictionary. Now we can copy the code for creating pff
files into our batch application.

PROC ENUMERATION_AREA

// Put generated pff files in folder called pffFiles since there will be many
string pffFileName = maketext("pffFiles/CensusEA%011d.pff",
ENUMERATION_AREA);
setfile(pffFile,pffFilename,create);

filewrite(pffFile,"[Run Information]");
filewrite(pffFile,"Version=CSPro 6.1");
filewrite(pffFile,"AppType=Entry");

filewrite(pffFile,"[Parameters]");
// This passes the ids as parameters to the census
// application which can access them via the sysparm() function.
filewrite(pffFile,"DISTRICT=%d", DISTRICT);
filewrite(pffFile,"CONSTITUENCY=%d", CONSTITUENCY);
filewrite(pffFile,"COMMUNITY_COUNCIL=%s", COMMUNITY_COUNCIL);
filewrite(pffFile,"ZONE=%d", ZONE);
filewrite(pffFile,"SETTLEMENT=%d", SETTLEMENT);
filewrite(pffFile,"ENUMERATION_AREA=%d", ENUMERATION_AREA);

filewrite(pffFile,"[DataEntryInit]");

filewrite(pffFile,"[Files]");
filewrite(pffFile,"Application=%s","./LesothoCensus2016.ent");

// Modify data file to be based on EA. This will avoid conflicts when
// syncing.
string dataFileName = maketext("CensusEA%011d.dat", ENUMERATION_AREA);
filewrite(pffFile,"InputData=%s",dataFileName);

close(pffFile);

5
We use maketext() to set the name of the pff and the name of the data file to include the EA code
otherwise the rest of the logic is copied from the pff editor.

When we run our application we should have a pff file for each EA listed in the Excel spreadsheet. These
can now be copied to the tablet along with the .pen file.

Exercises
1. Modify the batch edit application to convert the assets in H77 from alpha (used for checkboxes)
to a set of numeric yes/no variables. Create the new variables in the dictionary and write logic to
set the value of each of the new variables based on the letters in H77. The pos() function makes
this easy.

2. Modify the batch edit application to add a check for someone with relationship of spouse but
marital status that is not married. Print a message for each case found. This should be done in
the batch edit application NOT in the data entry application.

3. Modify the batch edit application to add a check that the total number of housing units in H68 is
equal to the sum of the numbers of each type of housing unit (rontabole, heisi…). This should be
done in the batch edit application NOT in the data entry application.

4. Modify the batch application that generates the .pff files to add a description to the generated
pff files. The description is what will appear in the list of applications when the file is copied to
an Android device. If no description is present then the name of the pff file used, which with the
generated pff files is not very user friendly. Make the description something more friendly.

You might also like