Download as pdf or txt
Download as pdf or txt
You are on page 1of 205

Lesson 1: Introduction to GIS

modeling and Python


Overview
Welcome to Geography 485. Over the next ten weeks you'll work through four
lessons and a final project dealing with ArcGIS automation in Python. Each lesson
will contain readings, examples, and projects. Since the lessons are two weeks long,
you should plan between 20 - 24 hours of work to complete them, although this
number may vary depending on your prior programming experience. See the
Course Schedule section of this syllabus, below, for a schedule of the lessons and
course projects.

As with GEOG 483 and GEOG 484, the lessons in this course are project-based
with key concepts embedded within. However, because of the nature of computer
programming, there is no way this course can follow the step-by-step instruction
design of the previous courses. You will probably find the course to be more
challenging than the others. For that reason, it is more important than ever that
you stay on schedule and take advantage of the course message boards and private
e-mail. It's quite likely that you will get stuck somewhere during the course, so
before getting hopelessly frustrated, please seek help from me or your classmates!

I hope that by now that you have reviewed our Orientation and Syllabus for an
important course site overview. Before we begin our first project, let me share some
important information about the textbook and a related Esri course.

Textbook and readings

The textbook for this course is Python Scripting for ArcGIS by Paul A. Zandbergen.
This book came out in 2012 and has been a hot item among Esri software users; I
suggest you order your copy immediately in case of shortages or delays.

Back when Geog 485 was rewritten as a Python course, there was no textbook
available that tied together ArcGIS and Python scripting. As you read through
Zandbergen's book, you'll see material that closely parallels what is in the Geog 485
lessons. This isn't necessarily a bad thing; when you are learning a subject like
programming, it can be helpful to have the same concept explained from two
angles.

My advice about the readings is this: Read the material on the Geog 485 lesson
pages first. If you feel like you have a good understanding from the lesson pages,
you can skim through some of the more lengthy Zandbergen readings. If you
struggled with understanding the lesson pages, you should pay close attention to
the Zandbergen readings and try some of the related code snippets and exercises. I
suggest you plan about 1 - 2 hours per week of reading if you are going to study the
chapters in detail.

In all cases, you should get a copy of the textbook because it is a relevant and
helpful reference.

Esri Virtual Campus Courses Using Python in ArcGIS Desktop 10

There is a free Esri Virtual Campus course, Using Python in ArcGIS Desktop 10 [1],
that introduces a lot of the same things you'll learn this quarter in Geog 485. The
course consists of a one-hour recorded seminar and a walkthrough exercise. If you
want to get a head start, or you feel you want some reinforcement of what we're
learning from a different point of view, it would be worth your time to complete
this Virtual Campus course.

All you need in order to access this course is an Esri Global Account, which you can
create for free. You do not need to obtain an access code from Penn State.

The video moves very quickly and covers a range of concepts that we'll spend 10
weeks studying in depth, so don't worry if you don't understand it all immediately.
You might find it helpful to watch the video again near the end of Geog 485 to
review what you've learned.

Questions?

If you have any questions now or at any point during this week, please feel free to
post them to the Lesson 1 Discussion Forum. (To access the forums, return to
ANGEL via the ANGEL link in the Resources menu. Once in ANGEL, you can
navigate to the Communicate tab and then scroll down to the Discussion
Forums section.) While you are there, feel free to post your own responses if you,
too, are able to help a classmate.

Now, let's begin Lesson 1.

Lesson 1 checklist
This lesson is two weeks in length. (See the Calendar in ANGEL for specific due
dates.) To finish this lesson, you must complete the actvities listed below. You may
find it useful to print this page so that you can follow along with the directions.

1. Download the Lesson 1 data [2] and extract it to

C:\WCGIS\Geog485\Lesson1 or a similar path that is easy to remember.

2. Work through the online sections of Lesson 1.


3. Read Zandbergen chapters 2 - 3. In the online lesson pages I have inserted

instructions about when it is most appropriate to read each of these

chapters.

4. Complete Project 1, Part I and submit the deliverables to the course drop

box.

5. Complete Project 1, Part II and submit the deliverables to the course drop

box.

6. Complete the Lesson 1 Quiz.

1.1.1 The need for GIS automation


A geographic information system (GIS) can manipulate and analyze spatial
datasets with the purpose of solving geographic problems. GIS analysts perform all
kinds of operations on data to make it useful for solving a focused problem. This
includes clipping, reprojecting, buffering, merging, mosaicking, extracting subsets
of the data, and hundreds of other operations. In the ArcGIS software used in this
course, these operations are known as geoprocessing and they are performed using
tools.

Successful GIS analysis requires selecting the right tools to operate on your data.
ArcGIS uses a toolbox metaphor to organize its suite of tools. You pick the tools you
need and run them in the proper order to make your finished product.

Suppose you’re responsible for selecting sites for video stores. You might use one
tool to select land parcels along a major thoroughfare, another tool to select parcels
no smaller than 0.25 acres, and other tools for other selection criteria. If this
selection process were limited to a small area, it would probably make sense to
perform the work manually.

However, let’s suppose you’re responsible for carrying out the same analysis for
several areas around the country. Because this scenario involves running the same
sequence of tools for several areas, it is one that lends itself well to automation.
There are several major benefits to automating tasks like this:
 Automation makes work easier. Once you automate a process, you don't

have to put in as much effort remembering which tools to use or the proper

sequence in which they should be run.

 Automation makes work faster. A computer can open and execute tools in

sequence much faster than you can accomplish the same task by pointing

and clicking.

 Automation makes work more accurate. Any time you perform a manual

task on a computer, there is a chance for error. The chance multiplies with

the number and complexity of the steps in your analysis. In contrast, once

an automated task is configured, a computer can be trusted to perform the

same sequence of steps every time.

ArcGIS provides three ways for users to automate their geoprocessing tasks. These
three options differ in the amount of skill required to produce the automated
solution and in the range of scenarios that each can address.

The first option is to construct a model using Model Builder. Model Builder is an
interactive program that allows the user to “chain” tools together, using the output
of one tool as input in another. Perhaps the most attractive feature of Model
Builder is that users can automate rather complex GIS workflows without the need
for programming. You will learn how to use Model Builder early in this course.

Some automation tasks require greater flexibility than is offered by Model Builder,
and for these scenarios it's recommended that you write scripts. The bulk of this
course is concerned with script writing.

A script is a program that executes a sequential procedure of steps. Within a script,


you can run GIS tools individually or chain them together. You can insert
conditional logic in your script to handle cases where different tools should be run
depending on the output of the previous operation. You can also include iteration,
or loops, in a script to repeat a single action as many times as needed to accomplish
a task.

There are special scripting languages for writing scripts, including Python, JScript,
and Perl. Often these languages have more basic syntax and are easier to learn than
other languages such as C, Java, or Visual Basic.
Although ArcGIS supports various scripting languages for working with its tools,
Esri emphasizes Python in its documentation and includes Python with the ArcGIS
install. In this course we’ll be working strictly with Python. You’ll learn the basics of
the Python language, how to write a script, and how to manipulate and analyze GIS
data using scripts. Finally, you’ll apply your new Python knowledge to a final
project, where you write a script of your choosing that you may be able to apply
directly to your work.

The third option available to ArcGIS users looking to automate geoprocessing is to


build a solution using ArcObjects, the programming building blocks used by Esri’s
own developers to produce the ArcGIS desktop products. With ArcObjects, it is
possible to customize the user interface to include specific commands and tools
that either go outside the abilities of the out-of-the-box ArcGIS tools or modify
them to work in a more focused way. ArcObjects programming and interface
customization are outside the scope of this course, but are covered in the GIS
Application Development course, GEOG 489. GIS customization with ArcObjects
can be an advanced endeavor, and learning a scripting language like Python is a
good way to prepare yourself by learning basic programming concepts.

The tools that you run in ModelBuilder and Python actually use ArcObjects "under
the hood" to run GIS functions; however, the advantage of Python scripting with
ArcGIS is that you don't need to learn all the ArcObjects logic behind the tools.
Your job is just to learn the tools and how to run them in the appropriate order to
accomplish your task.

This first lesson will introduce you to concepts in both model building and script
writing. We’ll start by just getting familiar with how tools run in ArcGIS and how
you can use those tools in the ModelBuilder interface. Then, we’ll cover some of the
basics of Python and see how the tools can be run within scripts.

1.2.1 Exploring the toolbox


The ArcGIS software that you use in this course contains hundreds of tools that you
can use to manipulate and analyze GIS data. Back before ArcGIS had a graphical
user interface (GUI), people would access these tools by typing commands.
Nowadays, you can point and click your way through a whole hierarchy of
toolboxes using ArcCatalog or the Catalog window in ArcMap.

Although you may have seen them before, let’s take a quick look at the toolboxes:

1. Open ArcMap.

2. If the Catalog window isn't visible, click the Windows menu, then click

Catalog. (If you've used previous versions of ArcGIS, this is a new window
at version 10 that allows you to have a lot of the ArcCatalog functionality

available in ArcMap.) If you hover over or click the Catalog item on the right

side of your screen, you can make the Catalog window appear. Optionally,

you can "pin" it down so that it doesn't hide itself.

3. In the Catalog, expand the nodes Toolboxes > System Toolboxes and

continue expanding the toolboxes of your choice until you see some of the

available tools. Notice that they’re organized into toolboxes and toolsets.

Sometimes it’s faster to use the Search window to find the tool you need

instead of browsing this tree.

4. Let’s examine a tool. Expand Analysis Tools > Proximity > Buffer, and
double-click the Buffer tool to open it.

At this point, you’re looking at a dialog with many fields. Each geoprocessing
tool has required inputs and outputs. Those are indicated by the green dots.
They represent the minimum amount of information you need to supply in
order to run a tool. For the Buffer tool, as inputs, you’re required to supply
an input features location (the features that will be buffered) and a buffer
distance. You’re also required to indicate an output feature class location
(for the new buffered features).

Many tools also have optional parameters. You can modify these if you want,
but if you don’t supply them, the tool will still run using default values. For
the Buffer tool, optional parameters are the Side Type, End Type, Dissolve
Type, and Dissolve Fields. Optional parameters are typically specified after
required parameters.

5. Click the Show Help button in the lower-right corner of the tool (if it says
Hide Help then you’re already viewing help). You can now click on any
parameter in the dialog to see an explanation of that parameter appear in
the right-hand window.

If you’re not sure what a parameter means, this is a good way to learn. For
example, with the help still open, click the Side Type input box on the
Buffer tool (right where it says "FULL"). The Help explains what the Side
Type parameter means and lists the different options: FULL, LEFT, RIGHT,
and OUTSIDE_ONLY.
If you need even more help, each tool is fully documented in the ArcGIS Desktop
Help. You could go directly to the Buffer tool help by clicking the Tool Help
button in the tool dialog box, but in this course you'll often want to get to these help
pages without opening the tool itself. Below are the steps for doing so.

1. From the main menu of ArcMap, click Help > ArcGIS Desktop Help.

Optionally, for the most up-to-date help, you can use the Web-based help at

http://webhelp.esri.com [3]. (All links to the Help in this course will open

the Web Help.)

2. In the ArcGIS Desktop Help table of contents, expand Professional

Library > Geoprocessing > Geoprocessing tool reference. (If you

are using 10.1, browse to Geoprocessing > Tool Reference instead.)

Notice that the help topics in this section are organized into toolboxes and

toolsets, paralleling the layout of the ArcGIS System Toolboxes.

3. Continue navigating the help table of contents to Analysis toolbox >

Proximity toolset > Buffer. Scroll through the entire topic examining all

the information that is given about the Buffer tool. Here you get tips about

what the Buffer tool does, how to use it, a full list of parameters, and

scripting examples written in Python. These scripting examples will be

extremely valuable to you as you complete the assignments in this course

and you should always check the Geoprocessing Tool Reference in the Help

if you’re having trouble getting a tool to run in Python.

1.2.2 Environments for accessing


tools
You can access ArcGIS geoprocessing tools in several different ways:
 Sometimes you just need to run a tool once, or you want to experiment with

a tool and its parameters. In this case, you can open the tool directly from

the Catalog and use the tool’s graphical user interface (GUI, pronounced

gooey) to fill in the parameters.

 ModelBuilder is also a GUI application where you can set up tools to run in a

given sequence, using the output of one tool as input to another tool.

 If you’re familiar with the tool and want to use it quickly in ArcMap, you

may prefer the Python window approach. You type the tool name and

required parameters into a command window. You can use this window to

run several tools in a row and declare variables, effectively doing simple

scripting.

 If you want to run the tool automatically, repeatedly, or as part of a greater

logical sequence of tools, you can run it from a script. Running a tool from a

script is the most powerful and flexible option.

We’ll start with the simplest of these cases, running a tool from its GUI, and work
our way up to scripting.

1.2.3 Running a tool from its GUI


Let’s start by opening a tool from the Catalog window and running it using its
graphical user interface (GUI).

1. If by chance you still have the Buffer tool open from the previous section,

close it for now so you can add some data.

2. Create a folder on your machine at C:\WCGIS\Geog485. If you use a

different path, be sure to substitute your path in the following examples.


3. Download the Lesson 1 data [2] and extract Lesson1.zip into your new folder

so that the data is under the path C:\WCGIS\Geog485\Lesson1. This folder

contains a variety of datasets you will use throughout the lesson.

4. Open ArcMap and create a new empty map.

5. Click the Add Data button and browse to the data you just extracted.

Add the us_boundaries and us_cities shapefiles.

6. Open the Catalog window if necessary and browse to the Buffer tool as you

did in the previous section.

7. Double-click the Buffer tool to open it.

8. Examine the first required parameter: Input Features. Click the Browse

button and browse to the path of your cities dataset


C:\WCGIS\Geog485\Lesson1\us_cities.shp. Notice that once you do this, a
path is automatically supplied for the Output Feature Class. The software
does this for your convenience only and you can change the path if you want.

A more convenient way to supply the Input Features is to just select the
cities map layer from the dropdown menu. This dropdown automatically
contains all the layers in your map document. However, in this example we
browsed to the path of the data because it’s conceptually similar to how we’ll
provide the paths in the command line and scripting environments.

9. Now you need to supply the Distance parameter for the buffer. For this run

of the tool, set a Linear unit of 5 miles. When we run the tool from the

other environments, we’ll make the buffer distance slightly larger so we

know that we got distinct outputs.

10. The rest of the parameters are optional. The Side Type and End Type

parameters apply only to lines and polygons, so they are not even available

for setting in the GUI environment when working with city points. However,

change the Dissolve Type to ALL.


11. Click OK to run the tool.

12. The tool should take just a few seconds to complete. Examine the output

that appears on the map, and do a “sanity check” to make sure that buffers

appear around the cities and they appear to be about 5 miles in radius. You

may need to zoom in to a single state in order to see the buffers.

13. Click the Geoprocessing menu and click Results. This window lists

messages about successes or failures of all recent tools that you've run.

14. Expand the Buffer tool until you can see all the messages. They list the tool

parameters, the time of completion, and any problems that occurred when

running the tool. (See Figure 1.1.) These messages can be a big help later

when you troubleshoot your Python scripts. The text of these messages is

available whether you run the tool from the GUI, from the Python window in

ArcMap, or from scripts.

Figure 1.1 Screen capture showing the Buffer tool and all messages.

1.2.4 Modeling with tools


When you work with geoprocessing, you’ll frequently want to use the output of one
tool as the input into another tool. For example, suppose you want to find all fire
hydrants within 200 meters of a building. You would first buffer the building, then
use the output buffer as a spatial constraint for selecting fire hydrants. The output
from the Buffer tool would be used as an input to the Select by Location tool.

A set of tools chained together in this way is called a model. Models can be simple,
consisting of just a few tools, or complex, consisting of many tools and parameters
and occasionally some iterative logic. Whether big or small, the benefit of a model
is that it solves a unique geographic problem that cannot be addressed by one of
the “out-of-the-box” tools.

In ArcGIS, modeling can be done either through the ModelBuilder graphical user
interface (GUI) or through code, using Python. To keep our terms clear, we’ll refer
to anything built in ModelBuilder as a “model” and anything built through Python
as a “script.” However, it’s important to remember that both things are doing
modeling.

1.3.1 Why learn ModelBuilder?


ModelBuilder is Esri’s graphical interface for making models. You can drag and
drop tools from the Catalog window into the model and “connect” them, specifying
the order in which they should run.

Although this is primarily a programming course, we’ll spend some time in


ModelBuilder during the first lesson for two reasons:

ModelBuilder is a nice environment for exploring the ArcGIS tools, learning how
tool inputs and outputs are used, and visually understanding how GIS modeling
works. When you begin using Python, you will not have the same visual assistance
to see how the tools you’re using are connected, but you may still want to draw your
model on a whiteboard in a similar fashion to what you saw in ModelBuilder.

ModelBuilder can frequently reduce the amount of Python coding that you need to
do. If your GIS problem does not require advanced conditional and iterative logic,
you may be able to get your work done in ModelBuilder without writing a script.
ModelBuilder also allows you to export any model to Python code, so even if you do
need to write a script, you may be able to use ModelBuilder to get a head start.

1.3.2 Opening and exploring


ModelBuilder
Let’s get some practice with ModelBuilder to solve a real scenario. Suppose you are
working on a site selection problem where you need to select all areas that fall
within 10 miles of a major highway and 10 miles of a major city. The selected area
cannot lie in the ocean or outside the United States. Solving the problem requires
that you make buffers around both the roads and the cities, intersect the buffers,
then clip to the US outline. Instead of manually opening the Buffer tool twice,
followed by the Intersect tool, then the Clip tool, you can set this up in
ModelBuilder to run as one process.
That’s it! You’ve just used ModelBuilder to chain together several tools and solve a
GIS problem.

You can double-click this model any time in the Catalog window and run it just as
you would a tool. If you do this, you’ll notice that the model has no parameters; you
can’t change the buffer distance or input features. The truth is, our model is useful
for solving this particular site-selection problem with these particular datasets, but
it’s not very flexible. In the next section of the lesson, we’ll make this model more
versatile by configuring some of the variables as input and output parameters.

1. Create a new map document in ArcMap and add the us_cities, us_roads,

and us_boundaries shapefiles from the Lesson 1 data folder that you

configured previously in this lesson. Save your map document as

C:\WCGIS\Geog485\Lesson1\ModelPractice.mxd.

2. In ArcGIS, all models are stored in toolboxes. The first thing you need to do

is create a toolbox to hold your new model. If the Catalog window is not

visible already, display it by clicking the menu item Windows > Catalog.

3. In the Catalog window, expand the nodes until you see Toolboxes > My

Toolboxes.

4. Right-click My Toolboxes and click New > Toolbox. Name it Lesson 1.

(The software may add .tbx, which is fine.)

5. Right-click the Lesson 1 toolbox and click New > Model. You’ll see

ModelBuilder appear.

6. In ModelBuilder, click Model > Model Properties.

7. For the Name, type SuitableLand and for the Label, type Find Suitable

Land. The label is what everyone will see when they open your tool from the

Catalog. That’s why it can contain spaces. The name is what people will use

if they ever run your model from Python. That’s why it cannot contain

spaces.
8. Click OK to dismiss the Model Properties dialog.

You now have a blank canvas on which you can drag and drop the tools.
When creating a model (and when writing Python scripts), it’s best to break
your problem into manageable pieces. The simple site selection problem
here can be thought of as four steps:

o Buffer the cities

o Buffer the roads

o Intersect the buffers

o Clip to the US boundary

Let’s tackle these items one at a time, starting with buffering the cities.

9. With ModelBuilder still open, go to the Catalog window and browse to

Toolboxes > System Toolboxes > Analysis Tools > Proximity.

10. Click the Buffer tool and drag it onto the ModelBuilder canvas. You’ll see a
white rectangular box representing the buffer tool and a white oval
representing the output buffers. These are connected with a line, showing
that Buffer tool will always produce an output dataset.

In ModelBuilder, tools are represented with boxes and variables are


represented with ovals. Right now, the Buffer tool, at center, is white
because you have not yet supplied the required parameters. Once you do
this, the tool and the variable will fill in with color.

11. In your ModelBuilder window, double-click the Buffer box. The tool dialog

here is the same as if you had opened the Buffer directly out of ArcToolbox.

This is where you can supply parameters for the tool.

12. For Input Features, browse to the path of your us_cities shapefile on disk.

The Output Feature Class will populate automatically.

13. For Distance [value or field] enter 10 miles.

14. For Dissolve Type, select ALL, then click OK to close the Buffer dialog.

The model elements (tools and variables) should be filled in with color, and
you should see a new element to the left of the tool representing the input

cities feature class.

15. An important part of working with ModelBuilder is supplying clear labels for
all the elements. This way, if you share your model, others can easily
understand what will happen when it runs. Supplying clear labels also helps
you remember what the model does, especially if you haven’t worked with
the model for a while.

In ModelBuilder, right-click the us_cities.shp element (blue oval, at far left)


and click Rename. Name this element "US Cities."

16. Right-click the Buffer tool (yellow-orange box, at center) and click Rename.

Name this “Buffer the cities.”

17. Right-click the us_citiesBuffer1.shp element (green oval, at far right) and
click Rename. Name this “Buffered cities.” Your model should look like
this.

Fig 1.2 The model's appearance following step 17, above.

18. Save your model (Model > Save). This is the kind of activity where you

want to save often.

19. Practice what you just learned by adding another Buffer tool to your model.
This time, configure the tool so that it buffers the us_roads shapefile by 10
miles. Remember to set the Dissolve type to ALL and to add meaningful
labels. Your model should now look like this.
Figure 1.3 The model's appearance following step 19, above.

20. The next task is to intersect the buffers. In the Catalog window's list of

toolboxes, browse to Analysis Tools > Overlay and drag the Intersect

tool onto your model. Position it to the right of your existing Buffer tools.

21. Here’s the pivotal moment when you chain the tools together, setting the

outputs of your Buffer tools as the inputs of the Intersect tool. Click the

Connect tool , then click the Buffered cities element followed by the

Intersect element. If you see a small menu appear, click Input Features to

denote that the buffered cities will act as inputs to the Intersect tool. An

arrow will now point from the Buffered cities element to the Intersect

element.

22. Use the same process to connect the Buffered roads to the Intersect element.

Again, if prompted, click Input Features.


23. Rename the output of the Intersect operation "Intersected buffers." If the
text runs onto multiple lines, you can click and drag the edges of the element
to resize it. You can also rearrange the elements on the page however you
like. Because models can get large, ModelBuilder contains several navigation
buttons for zooming in and zooming to the full extent of the model.
Your model should now look like this:

Fig. 1.4 The model's appearance following step 23, above.

24. The final step is to clip the intersected buffers to the outline of the United

States. This prevents any of the selected area from falling outside the

country or in the ocean. In the Catalog window, browse to Analysis Tools

> Extract and drag the Clip tool into ModelBuilder. Position this tool to

the right of your existing tools.

25. Use the Connect tool again to set the Intersected buffers as an input to the

Clip tool, choosing Input Features when prompted. Notice that even when

you do this, the Clip tool is not ready to run (it’s still shown as a white

rectangle, located at right). You need to supply the clip features, which is the

shape to which the buffers will be clipped.


26. In ModelBuilder (not in the Catalog window), double-click the Clip tool. Set

the Clip Features by browsing to the path of us_boundaries.shp, then click

OK to dismiss the dialog. You’ll notice that a blue oval appeared

representing the Clip Features (US Boundaries).

27. Set meaningful labels for the remaining tools as shown below. Below is an
example of how you can label and arrange the model elements.

Fig 1.5 The completed model with the clip tool included.

28. Double click the final output element (named "Suitable land" in the image

above) and set the path to C:\WCGIS\Geog485\Lesson1\suitable_land.shp.

This is where you can expect your model output feature class to be written to

disk.

29. Right-click Suitable land and click Add to display.

30. Save your model again.

31. Test the model by clicking the Run button . You’ll see the by-now-

familiar geoprocessing message window that will report any errors that may

occur. ModelBuilder also gives you a visual cue of which tool is running by
turning the tool red. (If the model crashes, try closing ModelBuilder and

running the model by double-clicking it from the Catalog window. You'll get

a message that the model has no parameters. This is okay [and true, as you'll

learn below]. Go ahead and run the model anyway.)

32. When the model has finished running (it may take a while), examine the
output in ArcMap. Zoom in to Washington state to verify that the has Clip
worked on the coastal areas. The output should look similar to this.

Fig. 1.6 The model's output in ArcMap.

1.3.3 Model parameters


Most tools, models, and scripts that you create with ArcGIS have parameters. Input
parameters are values with which the tool (or model or script) starts its work, and
output parameters represent what the tool gives you after its work is finished.
A tool, model, or script without parameters is only good in one scenario. Consider
the model you just built that used the Buffer, Intersect, and Clip tools. This model
was hard-coded to use the us_cities, us_roads, and us_boundaries shapefiles and
output a shapefile called suitable_land. If you wanted to run the model with other
datasets, you would have to open ModelBuilder, double-click each element (US
Cities, US Roads, US Boundaries, and Suitable land), and change the paths. You
would have to follow a similar process if you wanted to change the buffer distances,
too, since those were hard-coded to 10 miles.

Let’s modify that model to use some parameters, so that you can easily run it with
different datasets and buffer distances.

1. If it's not open already, open the map document

C:\WCGIS\Geog485\Lesson1\ModelPractice.mxd in ArcMap.

2. In the Catalog window, find the model you created in the previous lesson

which should be under Toolboxes > My Toolboxes > Lesson 1 > Find

Suitable Land.

3. Right-click the model Find Suitable Land and click Copy. Now right-click

the Lesson 1 toolbox and click Paste. This creates a new copy of your model

that you can work with to create model parameters. Using a copy of the

model like this allows you to easily start over if you make a mistake.

4. Rename the copy of your model Find Suitable Land With Parameters

or something similar.

5. In your Lesson 1 toolbox, right-click Find Suitable Land With

Parameters and click Edit. You'll see the model appear in ModelBuilder.

6. Right-click the element US Cities (should be a blue oval) and click Model

Parameter. This means that whoever runs the model must specify the

cities dataset to use before the model can run.

7. You need a more general name for this parameter now, so right-click the US

Cities element and click Rename. Change the name to just "Cities."
8. Even though you "parameterized" the cities, your model still defaults to
using the C:\WCGIS\Geog485\Lesson1\us_cities.shp dataset. This isn't
going to make much sense if you share your model or toolbox with other
people because they may not have the same us_cities shapefile, and even if
they do, it probably won't be sitting at the same path on their machines.

To remove the default dataset, double-click the Cities element and delete the
path, then click OK. Some of the elements in your model may turn white.
This signifies that a value has to be provided before the model can
successfully run.

9. Now you need to create a parameter for the distance of the buffer to be

created around the cities. Right-click the element that you named "Buffer

the cities" and click Make Variable > From Parameter > Distance

[value or field].

10. The previous step created a new element Distance [value or field]. Rename

this element to "Cities buffer distance" and make it a model parameter.

(Review the steps above if you're unsure about how to rename an element or

make it a model parameter.) For this element, you can leave the default at 10

miles. Your model should look similar to this, although the title bar of your

window may vary:


Figure 1.7 The "Find Suitable Land With Parameters" model following Step

10, above, and showing two parameters.

11. Repeating what you learned above, rename the US Roads element to

"Roads," make it a model parameter, and remove the default value.

12. Repeating what you learned above, make a parameter for the Roads buffer

distance. Leave the default at 10 miles.

13. Repeating what you learned above, rename the US Boundaries element to

Boundaries, make it a model parameter, and remove the default value. Your

model should look like this (notice the five parameters indicated by "P"s):

Figure 1.8 The "Find Suitable Land With Parameters" model following Step

13, above, and showing five parameters.

14. Save your model and close ModelBuilder.


15. Double-click your model Lesson 1 > Find Suitable Land With
Parameters and examine the tool dialog. It should look similar to this:

Figure 1.9 The model interface, or tool dialog, for the model "Find Suitable

Land With Parameters."

People who run this model will be able to browse to any cities, roads, and
boundaries datasets, and will be able to control the buffer distance. The
green dots indicate parameters that must be supplied with valid values
before the model can run.

16. Test your model by supplying the us_cities, us_roads, and us_boundaries

shapefiles for the model parameters. If you like, you can try changing the

buffer distance.

Note that sometimes the result does not add itself to the display like it

should. You should just be able to add it to the display by using the Add

Data button and browsing to the suitable_land.shp location.


The above exercise demonstrated how you can expose values as parameters using
ModelBuilder. You need to decide which values you want the user to be able to
change and designate those as parameters. When you write Python scripts, you'll
also need to choose parameters in a similar way.

1.3.4 Advanced geoprocessing and


ModelBuilder concepts
By now you've had some practice with ModelBuilder and you're about ready to get
started with Python. This page of the lesson contains some optional advanced
material that you can read about ModelBuilder. This is particularly helpful if you
anticipate using ModelBuilder frequently in your employment. Some of the items
are common to the ArcGIS geoprocessing framework, meaning that they also apply
when writing Python scripts with ArcGIS.

Managing intermediate data

GIS analysis sometimes gets messy. Most of the tools that you run produce an
output dataset, and when you chain many tools together those datasets start piling
up on disk. Even if you're diligent about naming your datasets intuitively, it's easy
to wind up with a folder full of datasets with names like buffers1, clippedbuffers1,
intersectedandclippedbuffers1, raster2reclassified, etc.

In most cases, you are concerned with just the final output dataset. The
intermediate data is just temporary; you only need to keep it around for as long as
it takes to run the model, and then it can be deleted.

ModelBuilder can manage your intermediate data for you, placing it in a temporary
directory called the scratch workspace. By default, the scratch workspace is your
operating system's temp directory, but you can configure it to exist in another
location.

You can force data to go into the scratch workspace by using the
%SCRATCHWORKSPACE% variable in the path. For example:
%SCRATCHWORKSPACE%\myOutput.shp

You can also mark any element in ModelBuilder as Intermediate and it will be
deleted after the model is run. By default, all derived data is Intermediate.

The following topics from Esri go into more detail on intermediate data and are
important to understand as you work with the geoprocessing framework. I suggest
reading them once now and returning to them occasionally throughout the course.
Some of the concepts in them are easier to understand once you've worked with
geoprocessing for a while.
 A quick tour of managing intermediate data [4]

 Using the current and scratch workspace environments [5]

 Setting current and scratch workspace environments [6]

 Managing intermediate data in shared models [7] (Skip the section about

ArcGIS Server)

Looping in ModelBuilder

Looping, or iteration, is the act of repeating a process. A main benefit of computers


is their ability to quickly repeat tasks that would otherwise be mundane,
cumbersome, or error-prone for a human to repeat and record. Looping is a key
concept in computer programming and you will use it often as you write Python
scripts for this course.

ModelBuilder contains a number of elements called Iterators that can do looping in


various ways. The names of these iterators, such as For and While actually mimic
the types of looping that you can program in Python and other languages. In this
course, we'll focus on learning iteration in Python, which may actually be just as
easy as learning how to use a ModelBuilder iterator.

To take a peek at how iteration works in ModelBuilder, you can visit the ArcGIS
Desktop help book for model iteration [8]. If you're having trouble understanding
looping in later lessons, ModelBuilder might be a good environment to visualize
what a loop does. You can come back and visit this book as needed.

Readings

Read Zandbergen Chapter 2.1 - 2.9 to reinforce what you learned about
geoprocessing and ModelBuilder.

1.4.1 Introducing Python using the


Python window in ArcGIS
The best way to introduce Python may be to look at a little bit of code. Let’s take the
Buffer tool which you recently ran from the ArcToolbox GUI and run it in the
ArcGIS Python window. This window allows you to type a simple series of Python
commands without writing full permanent scripts. The Python Window is a great
way to get a taste of Python.

This time, we’ll make buffers of 15 miles around the cities.


1. Open ArcMap to a new empty map.

2. Add the us_cities.shp dataset from the Lesson 1 data.

3. On the Standard toolbar, click the Python window button . Once the

window appears, drag it over to the side or bottom of the screen to dock it.

4. Type the following in the Python window (Don't type the >>>. These are just
included to show you where the new lines begin in the Python window.)
5. >>> import arcpy

>>> arcpy.Buffer_analysis("us_cities", "us_cities_buffered", "15

miles", "", "", "ALL")

6. Zoom in and examine the buffers that were created.

You’ve just run your first bit of Python. You don’t have to understand everything
about the code you wrote in this window, but here are a few important things to
note.

The first line of the script import arcpy tells the Python interpreter (which was
installed when you installed ArcGIS) that you’re going to work with some special
scripting functions and tools included with ArcGIS. Without this line of code,
Python knows nothing about ArcGIS, so you'll put it at the top of all ArcGIS-related
code that you write in this class. You technically don't need this line when you work
with the Python window in ArcMap because arcpy is already imported, but I
wanted to show you this pattern early; you'll use it in all the scripts you write
outside the Python window.

The second line of the script actually runs the tool. You can type arcpy, plus a dot,
plus any tool name to run a tool in Python. Notice here that you also put an
underscore followed by the name of the toolbox that includes the buffer tool. This is
necessary because some tools in different toolboxes actually have the same name
(like Clip, which is in both the Data Management and Analysis toolboxes).

After you typed arcpy.Buffer_analysis, you typed all the parameters for the
tool. Each parameter was separated by a comma, and the whole list of parameters
was enclosed in parentheses. Get used to this pattern, since you'll follow it with
every tool you run in this course.

In this code, we also supplied some optional parameters, leaving empty quotes
where we wanted to take the default values, and truncating the parameter list at the
final optional parameter we wanted to set.
How do you know the syntax, or structure, of the parameters to enter? For
example, for the buffer distance, should you enter 15MILES, ‘15MILES’, 15 Miles,
or ’15 Miles’? The best way to answer questions like these is to return to the
Geoprocessing tool reference help topic for the Buffer tool [9]. All of the topics in
this reference section have a command line usage and example section to help you
understand how to structure the parameters. All the required parameters are
shown inside carets (<>), while the optional parameters are shown inside braces
({}). From the example in this topic, you can see that the buffer distance should be
specified as ’15 miles’. Because there is a space in this text, or string, you need to
surround it with single quotes.

You might have noticed that the Python window helps you by popping up different
options you can type for each parameter. This is called autocompletion, and it can
be very helpful if you're trying to run a tool for the first time and you don't know
exactly how to type the parameters. When you write code in PythonWin, you don't
get the autocompletion, so you may want to return to the Python window for tips as
you practice writing lines of code. If you can get a line of code to work in the Python
window, it will probably work in your script that you're writing in PythonWin.

There are a couple of differences between writing code in the Python window and
writing code in some other program, such as Notepad or PythonWin (which we'll
use throughout the course). In the Python window, you can reference layers in the
map document by their names only, instead of their file paths. Thus, we were able
to type "us_cities" instead of something like "C:\\data\\us_cities.shp". We were
also able to make up the name of a new layer "us_cities_buffered" and get it added
to the map by default after the code ran. If you're going to use your code outside the
Python window, make sure you use the full paths.

When you write more complex scripts, it will be helpful to use an integrated
development environment (IDE), meaning a program specifically designed to help
you write and test Python code. Later in this course we’ll explore the PythonWin
IDE.

Earlier in this lesson you saw how tools can be chained together to solve a problem
using ModelBuilder. The same can be done in Python, but it’s going to take a little
groundwork to get to that point. For this reason we’ll spend the rest of Lesson 1
covering some of the basics of Python.

Readings

Take a few minutes to read Zandbergen Chapter 3, a fairly short chapter where he
explains the Python window and some things you can do with it.

1.4.2 What is Python?


Python is a language that is used to automate computing tasks through programs
called scripts. In the introduction to this lesson, you learned that automation
makes work easier, faster, and more accurate. This applies to GIS and many other
areas of computer science. Learning Python will make you a more effective GIS
analyst, but Python programming is a technical skill that can be beneficial to you
even outside the field of GIS.

Python is a good language for beginning programming. Python is a high-level


language, meaning you don’t have to understand the “nuts and bolts” of how
computers work in order to use it. Python syntax (how the code statements are
constructed) is relatively simple to read and understand. Finally, Python requires
very little overhead to get a program up and running.

Python is an open-source language and there is no fee to use it or deploy programs


with it. Python can run on Windows, Linux, and Unix operating systems.

In ArcGIS, Python can be used for coarse-grained programming, meaning that you
can use it to easily run geoprocessing tools such as the Buffer tool that we just
worked with. You could code all the buffer logic yourself, using more detailed, fine-
grained programming with ArcObjects, but this would be time consuming and
unnecessary in most scenarios; it’s easier just to call the Buffer tool from a Python
script using one line of code.

1.4.3 Installing Python and


PythonWin
If you installed the student version of ArcGIS, you should already have Python on
your computer. You can write Python code at any time in Notepad or other editors
and save it as a .py file, but you need to have Python installed in order for your
computer to understand and run the program.

In this course we’ll be working with Python version 2.6 (if you have ArcGIS 10.0) or
version 2.7 (if you have ArcGIS 10.1). If you download Python from its home page
at www.python.org [10], you’ll see that there are actually higher versions of Python
available. Python versions 3 and above contain some big changes and are going to
take some time for the Python user community to adopt. You may see some
information about Python 3 in your textbook that will give you an idea of the
changes coming in that version. You can read this information if you're interested,
but it's not applicable to this course.

Python comes with a simple default editor called IDLE; however, in this course
you’ll use the PythonWin integrated development environment (IDE) to help you
write code. PythonWin is free, has basic debugging capabilities, and is included
with ArcGIS. The only catch is that it is not installed by default with ArcGIS; you
have to do it manually by following the steps below. If you are using ArcGIS 10.1,
replace any instances of 2.6 or 26 below with 2.7 or 27, respectively.

1. Insert the ArcGIS Education Edition DVD into your computer.

2. Dismiss any welcome screens that appear and choose Start > My

Computer (or “Computer” on Windows Vista or Windows 7).

3. Find your DVD drive, right-click it, and click Open. Your goal is to get to the

folder structure of the DVD, not run the Auto Play that shows the Esri

welcome screen.

4. Once you’ve successfully displayed the folder structure, open the Desktop

folder.

5. Open the PythonWin folder.

6. Start the install by launching pywin32-210.win32-py2.6 (or whatever file is

not the PythonWin readme file). If you are using Windows Vista or Windows

7, right-click this file and choose Run as Administrator and, when

prompted, choose to Allow it to run.

7. Click Next through the wizard to install PythonWin.

8. PythonWin doesn't put a Windows shortcut anywhere, so you get to make

one yourself. Once the install completes, use My Computer (or "Computer")

to browse to the location where you installed PythonWin. It's probably in

C:\Python26\ArcGIS10.0\Lib\site-packages\pythonwin.

9. Right-click the item Pythonwin and click Create Shortcut. You should

see a Windows shortcut appear immediately below the Pythonwin item.

10. Drag and drop the shortcut onto your Desktop or wherever else you want to

put it.
On Windows Vista or Windows 7, if you see error messages during install, it’s likely
that you did not run the install as an Administrator. When you launch the install,
make sure you right-click and choose Run as Administrator.

1.4.4 Exploring PythonWin


Here’s a brief explanation of the main parts of PythonWin. Before you begin
reading, open PythonWin so you can follow along.

When PythonWin opens, you’ll see what’s known as the Interactive Window. You
can type a line of Python at the >>> prompt and it will immediately execute and
print the result, if there is a printable result. The Interactive Window can be a good
place to practice with Python in this course, and whenever you see some Python
code next to the >>> prompt in the lesson materials, this means you can type it in
the Interactive Window to follow along. In these ways, the Interactive Window is
very similar to the Python window in ArcGIS.

To actually write a new script, click File > New and choose Python Script.
Notice a blank page opens that looks a whole lot like Notepad. However, the nice
thing about this interface is that the code is color-coded and the default font,
Courier, is one typically used by programmers. Spacing and indentation, which are
important in Python, are also easy to keep track of in this interface.

The Standard toolbar


contains tools for
loading, running, and saving scripts. This toolbar is visible by default. Notice the
Undo / Redo buttons , which can be useful to you as a programmer if you
start coding something and realize you’ve gone down the wrong path, or if you
delete a line of code and want to get it back. Also notice the Run button , which
looks like a little running person. This is a good way to test your scripts without
having to double-click the file in Windows Explorer.

The Debugging toolbar contains tools for


carefully reviewing your code line-by-line to help you detect errors. This toolbar is
visible by clicking View > Toolbars > Debugging. The Debugging toolbar is
extremely valuable to you as a programmer and you’ll learn how to use it later in
this course. This toolbar is one of the main reasons to use an Integrated
Development Environment (IDE) instead of writing your code in a simple text
editor like Notepad.

1.5.1 Working with variables


It’s time to get some practice with some beginning programming concepts that will
help you write some simple scripts in Python by the end of Lesson 1. We’ll start by
looking at variables.

Remember your first introductory algebra class where you learned that a letter
could represent any number, like in the statement x + 3? This may have been your
first exposure to variables. (Sorry if the memory is traumatic!) In computer science,
variables represent values or objects you want the computer to store in its memory
for use later in the program.

Variables are frequently used to represent not only numbers, but also text and
“Boolean” values (‘true’ or ‘false’). A variable might be used to store input from the
program’s user, to store values returned from another program, to represent
constant values, and so on.

Variables make your code readable and flexible. If you hard-code your values,
meaning that you always use the literal value, your code is useful only in one
particular scenario. You could manually change the values in your code to fit a
different scenario, but this is tedious and exposes you to greater risk of making a
mistake (suppose you forget to change a value). Variables, on the other hand, allow
your code to be useful in many scenarios and are easy to parameterize, meaning
you can let users change the values to whatever they need.

To see some variables in action, open PythonWin and type this in the Interactive
Window:

>>> x = 2

You’ve just created, or declared, a variable, x, and set its value to 2. In some
strongly-typed programming languages, such as Java, you would be required to tell
the program that you were creating a numerical variable, but Python assumes this
when it sees the 2.

When you hit Enter, nothing happens, but the program now has this variable in
memory. To prove this, type:

>>> x + 3

You see the answer of this mathematical expression, 5, appear immediately in the
Interactive Window, proving that your variable was remembered and used.

You can also use the print command to write the results of operations. We’ll use
this a lot when practicing and testing code.

>>>print x + 3
5
Variables can also represent words, or strings, as they are referred to by
programmers. Try typing this in the Interactive Window:

>>>myTeam = "Nittany Lions"


>>>print myTeam
Nittany Lions

In this example, the quotation marks tell Python that you are declaring a string
variable. Python is a powerful language for working with strings. A very simple
example of string manipulation is to add, or concatenate, two strings, like this:

>>> string1 = "We are "


>>> string2 = "Penn State!"
>>> print string1 + string2
We are Penn State!

You can include a number in a string variable by putting it in quotes, but you must
thereafter treat it like a string; you cannot treat it like a number. For example, this
results in an error:

>>>myValue = "3"
>>>print myValue + 2

In these examples you’ve seen the use of the = sign to assign the value of the
variable. You can always reassign the variable. For example:

>>> x = 5
>>> x = x - 2
>>> print x
3

When naming your variables, the following tips will help you avoid errors.

 Variable names are case sensitive. myVariable is a different variable than

MyVariable.

 Variable names cannot contain spaces.

 Variable names cannot begin with a number.

 A recommended practice for Python variables is to name the variable

beginning with a lower-case letter, then begin each subsequent word with a

capital letter. This is sometimes known as camel casing. For example:

myVariable, mySecondVariable, roadsTable, bufferField1, etc.


 Variables cannot be any of the special Python reserved words such as

"import" or "print."

Make variable names meaningful so that others can easily read your code. This will
also help you read your code and avoid making mistakes.

You’ll get plenty of experience working with variables throughout this course and
will learn more in future lessons.

1.5.2 Objects and object-oriented


programming
The number and string variables that we worked with above represent data types
that are built into Python. Variables can also represent other things, such as GIS
datasets, tables, rows, and the geoprocessor that we saw earlier that can run tools.
All of these things are objects that you use when you work with ArcGIS in Python.

In Python, everything is an object. All objects have:

 A unique ID, or location in the computer’s memory

 A set of properties that describe the object

 A set of methods, or things that the object can do

One way to understand objects is to compare performing an operation in a


procedural language (like FORTRAN) to performing the same operation in an
object-oriented language. We'll pretend that we are writing a program to make a
peanut butter and jelly sandwich. If we were to write the program in a procedural
language, it would flow something like this:

1. Go to the refrigerator and get the jelly and bread.

2. Go to the cupboard and get the peanut butter.

3. Take out two slices of bread.

4. Open the jars.

5. Get a knife.

6. Put some peanut butter on the knife.


7. etc.

8. etc.

If we were to write the program in an object-oriented language, it might look like


this:

1. mySandwich = Sandwich.Make

2. mySandwich.Bread = Wheat

3. mySandwich.Add(PeanutButter)

4. mySandwich.Add(Jelly)

In the object-oriented example, the bulk of the steps have been eliminated. The
sandwich object "knows how" to build itself, given just a few pieces of information.
This is an important feature of object-oriented languages known as encapsulation.

Notice that you can define the properties of the sandwich (like the bread type) and
perform methods (remember that these are actions) on the sandwich, such as
adding the peanut butter and jelly.

1.5.3 Classes
The reason it’s so easy to "make a sandwich" in an object-oriented language is that
some programmer, somewhere, already did the work to define what a sandwich is
and what you can do with it. He or she did this using a class. A class defines how to
create an object, the properties and methods available to that object, how the
properties are set and used, and what each method does.

A class may be thought of as a blueprint for creating objects. The blueprint


determines what properties and methods an object of that class will have. A
common analogy is that of a car factory. A car factory produces thousands of cars of
the same model that are all built on the same basic blueprint. In the same way, a
class produces objects that have the same predefined properties and methods.

In Python, classes are grouped together into modules. You import modules into
your code to tell your program what objects you’ll be working with. You can write
modules yourself, but most likely you'll bring them in from other parties or
software packages. For example, the first line of most scripts you write in this
course will be:

import arcpy
Here you're using the import keyword to tell your script that you’ll be working
with the arcpy module, which is provided as part of ArcGIS. After importing this
module, you can create objects that leverage ArcGIS in your scripts.

Other modules that you may import in this course are os (allows you to work with
the operating system), random (allows for generation of random numbers), and
math (allows you to work with advanced math operations). These modules are
included with Python, but they aren't imported by default. A best practice for
keeping your scripts fast is to import only the modules that you need for that
particular script. For example, although it might not cause any errors in your
script, you wouldn't include import arcpy in a script not requiring any ArcGIS
functions.

1.5.4 Inheritance
Another important feature of object-oriented languages is inheritance. Classes are
arranged in a hierarchical relationship such that each class inherits its properties
and methods from the class above it in the hierarchy (its parent class or
superclass). A class also passes along its properties and methods to the class below
it (its child class or subclass). A real-world analogy involves the classification of
animal species. As a species, we have many characteristics that are unique to
humans. However, we also inherit many characteristics from classes higher in the
class hierarchy. We have some characteristics as a result of being vertebrates. We
have other characteristics as a result of being mammals. To illustrate the point,
think of the ability of humans to run. Our bodies respond to our command to run
not because we belong to the "human" class, but because we inherit that trait from
some class higher in the class hierarchy.

Back in the programming context, the lesson to be learned is that it pays to know
where a class fits into the class hierarchy. Without that piece of information, you
will be unaware of all of the operations available to you. This information about
inheritance can often be found in informational posters called object model
diagrams.

Here's an example of the object model diagram for ArcGIS Python scripting at 9.3
[11] (unfortunately, there is no poster at ArcGIS 10, but the 9.3 poster still comes in
handy for some things like this). Take a look at the green box titled FeatureClass
and notice at the bottom it says Dataset properties. This is because FeatureClass
inherits all properties from Dataset. Therefore any properties on a Dataset object,
such as Extent or SpatialReference, can also be obtained if you create a
FeatureClass object. Apart from all the properties it inherits from Dataset, the
FeatureClass has its own specialized properties such as FeatureType and
ShapeType.

1.5.5 Python syntax


Every programming language has rules about capitalization, white space, how to
set apart lines of code and procedures, and so on. Here are some basic syntax rules
to remember for Python:

 Python is case-sensitive both in variable names and reserved words. That

means it’s important whether you use upper or lower-case. The all lower-

case "print" is a reserved word in Python that will print a value, while "Print"

is unrecognized by Python and will return an error. Likewise arcpy is very

sensitive about case and will return an error if you try to run a tool without

capitalizing the tool name.

 You end a Python statement by pressing Enter and literally beginning a new

line. (In some other languages, a special character, such as a semicolon,

denotes the end of a statement.) It’s okay to add empty lines to divide your

code into logical sections.

 If you have a long statement that you want to display on multiple lines for

readability, you need to use a line continuation character, which in Python

is a backslash (\). You can then continue typing on the line below and

Python will interpret the line sequence as one statement. One exception is if

you’re in the middle of parentheses () or brackets [], Python understands

that you are continuing lines and no backslash is required.

 Indentation is required in Python to logically group together certain lines, or

blocks, of code. You should indent your code four spaces inside loops,

if/then statements, and try/except statements. In most programming

languages developers are encouraged to use indentation to logically group

together blocks of code; however in Python, indentation of these language


constructs is not only encouraged, but required. Though this requirement

may seem burdensome, it does result in greater readability.

 You can add a comment to your code by beginning the line with a pound (#)

sign. Comments are lines that you include to explain what the code is doing.

Comments are ignored by Python when it runs the script, so you can add

them at any place in your code without worrying about their effect.

Comments help others who may have to work with your code in the future;

and they may even help you remember what the code does.

1.6.1 Introductory Python examples


Let’s look at a few example scripts to see how these rules are applied. The first
example script is accompanied with a walkthrough video that explains what
happens in each line of the code. You can also review the main points about each
script after reading the code.

1.6.2 Example: Printing the spatial


reference of a feature class
This first example script reports the spatial reference (coordinate system) of a
feature class stored in a geodatabase:

# Opens a feature class from a geodatabase and prints the spatial


reference

import arcpy

featureClass = "C:/Data/USA/USA.gdb/StateBoundaries"

# Describe the feature class and get its spatial reference


desc = arcpy.Describe(featureClass)
spatialRef = desc.SpatialReference

# Print the spatial reference name


print spatialRef.Name

This may look intimidating at first, so let’s go through what’s happening in this
script, line by line. Watch this video to get a visual walkthrough of the code.
Again, notice that:

 A comment begins the script to explain what’s going to happen.

 Case sensitivity is applied in the code. "import" is all lower-case. So is

"print". The module name "arcpy" is always referred to as "arcpy," not

"ARCPY" or "Arcpy". Similarly, "Describe" is capitalized in arcpy.Describe.

 The variable names featureClass, desc, and spatialRef that the programmer

assigned are short, but intuitive. By looking at the variable name, you can

quickly guess what it represents.

 The script creates objects and uses a combination of properties and methods

on those objects to get the job done. That’s how object-oriented

programming works.

Trying the example for yourself

The best way to get familiar with a new programming languages is to look at
example code and practice with it yourself. See if you can modify the script above to
report the spatial reference of a feature class on your computer. In my example the
feature class is in a file geodatabase; you’ll need to modify the structure of the
featureClass path if you are using a shapefile (for example, you'll put .shp at the end
of the file name, and you won't have .gdb in your path).

Follow this pattern to try the example:

1. Open PythonWin and click File > New.

2. Choose to make a Python script and click OK.

3. Paste in the code above and modify it to fit your data (change the path).

4. Save your script as a .py file.

5. Click the Run button to run the script. Make sure the Interactive Window is

visible when you do this, because this is where you’ll see the output from the
print keyword. The print keyword does not actually cause a hard copy to be

printed!

1.6.3 Example: Performing map


algebra on a raster
Here’s another simple script that finds all cells over 3500 meters in an elevation
raster and makes a new raster that codes all those cells as 1. Remaining values in
the new raster are coded as 0. This type of “map algebra” operation is common in
site selection and other GIS scenarios.

Something you may not recognize below is the expression Raster(inRaster). This
function just tells ArcGIS that it needs to treat your inRaster variable as a raster
dataset so that you can perform map algebra on it. If you didn't do this, the script
would treat inRaster as just a literal string of characters (the path) instead of a
raster dataset.

# This script uses map algebra to find values in an


# elevation raster greater than 3500 (meters).

import arcpy
from arcpy.sa import *

# Specify the input raster


inRaster = "C:/Data/Elevation/foxlake"
cutoffElevation = 3500

# Check out the Spatial Analyst extension


arcpy.CheckOutExtension("Spatial")

# Make a map algebra expression and save the resulting raster


outRaster = Raster(inRaster) > cutoffElevation
outRaster.save("C:/Data/Elevation/foxlake_hi_10")

# Check in the Spatial Analyst extension now that you're done


arcpy.CheckInExtension("Spatial")

Begin by examining this script and trying to figure out as much as you can based on
what you remember from the previous scripts you’ve seen.

The main points to remember on this script are:

 Notice the lines of code that check out the Spatial Analyst extension before

doing any map algebra and check it back in after finishing. Because each line
of code takes some time to run, avoid putting unnecessary code between

checkout and checkin. This allows others in your organization to use the

extension if licenses are limited. The extension automatically gets checked

back in when your script ends, thus some of the Esri code examples you will

see do not check it in. However, it is a good practice to explicitly check it in,

just in case you have some long code that needs to execute afterward, or in

case your script crashes and against your intentions "hangs onto" the

license.

 inRaster begins as a string, but is then casted to, or treated as, a Raster

object [12]once you run Raster(inRaster). A Raster object is a special object

used for working with raster datasets in ArcGIS. It's not available in just any

Python script: you can use it only if you import the arcpy module at the top

of your script.

 cutoffElevation is a number variable that you declare early in your script and

then use later on when you build the map algebra expression for your

outRaster.

 Your expression outRaster = Raster(inRaster) >

cutoffElevation is saying, in plain terms, "Make a new raster and call it

outRaster. Do this by taking all the cells of the raster dataset at the path of

inRaster that are greater than the number I assigned to the variable

cutoffElevation."

 outRaster is also a Raster object, but you have to call the method

outRaster.save() in order to make it permanent on disk. The save() method

takes one argument, which is the path to which you want to save.
Now try to run the script yourself using the FoxLake digital elevation model (DEM)
in your Lesson 1 data folder. If it doesn’t work the first time, verify that:

 You have supplied the correct input and output paths.

 Your path name contains forward slashes (/) or double backslashes (\\), not

single backslashes (\).

 You have the Spatial Analyst Extension installed and enabled. To check this,

open ArcMap, click Customize > Extensions and ensure Spatial

Analyst is checked.

 You do not have any of the datasets open in ArcMap.

 The output data does not exist yet. If you want to be able to overwrite the

output, you need to add the line arcpy.env.overwriteOutput = True.

This line can be placed immediately after import arcpy.

You can experiment with this script using different values in the map algebra
expression (try 3000 for example).

1.6.4 Example: Creating buffers


Think about the previous example where you ran some map algebra on an elevation
raster. If you wanted to change the value of your cutoff elevation to 2500 instead of
3500, you had to open the script itself and change the value of the cutoffElevation
variable in the code.

This third example is a little different. Instead of hard-coding the values needed for
the tool (in other words, literally including the values in the script) we’ll use some
user input variables, or parameters. This allows people to try different values in the
script without altering the code itself. Just like in ModelBuilder, parameters make
your script available to a wider audience.

The simple example below just runs the Buffer tool, but it allows the user to enter
the path of the input and output datasets as well as the distance of the buffer. The
user-supplied parameters make their way into the script with the
arcpy.GetParameterAsText() method.
Examine the script below carefully, but don't try to run it yet. You'll do that in the
next part of the lesson.

# This script runs the Buffer tool. The user supplies the input
# and output paths, and the buffer distance.

import arcpy
arcpy.env.overwriteOutput = True

try:
# Get the input parameters for the Buffer tool
inPath = arcpy.GetParameterAsText(0)
outPath = arcpy.GetParameterAsText(1)
bufferDistance = arcpy.GetParameterAsText(2)

# Run the Buffer tool


arcpy.Buffer_analysis(inPath, outPath, bufferDistance)

# Report a success message


arcpy.AddMessage("All done!")

except:
# Report an error messages
arcpy.AddError("Could not complete the buffer")

# Report any error messages that the Buffer tool might have generated
arcpy.AddMessage(arcpy.GetMessages())

Again, examine the above code line by line and figure out as much as you can about
what the code does. If necessary, print the code and write notes next to each line.
Here are some of the main points to understand:

 GetParameterAsText() is a function in the arcpy module. Notice that it takes

a zero-based integer (0, 1, 2, 3, etc.) as an argument. If you’re going to go

ahead and make a tool out of this script, as we are going to do in the next

page of this lesson, then it’s important you define the parameters in the

same order you want them to appear in the tool’s dialog.

 When we called the Buffer tool in this script, we supplied only three

parameters. By not supplying any more, we accepted the default values for

the rest of the tool’s parameter (Side Type, End Type, etc.).

 The try and except blocks of code are a way that you can prevent your script

from crashing if there is an error. Your script attempts to run all of the code
in the try block. If the script cannot continue for some reason, it jumps down

and runs the code in the except block. Inserting try/except blocks like this is

a good practice to do once you think you've gotten all the errors out of your

script, or when you want to make sure your code will run a certain line at the

end no matter what happens.

When you are first writing and debugging your script, sometimes it's more

useful to leave out try/except and let the code crash, because the (red) error

messages reported in the Interactive Window sometimes give you better

clues on how to diagnose the problem in your code. Suppose you put a print

statement in your except block saying "There was an error. Please try again."

For the end user of your script, this is nicer than seeing a nasty (red) error

message; however, as a programmer debugging the script, you want to see

the (red) error message to get any insight you can about what went wrong.

Projects that you submit in this course require error handling using

try/except in order to receive full credit.

 The arcpy.AddMessage() and arcpy.AddError() methods are ways of adding

additional messages to the user of the tool. Whenever you run a tool, the

geoprocessor prints messages, which you have probably seen before (for

example, “Executed (Buffer) successfully. End time: Sat Oct 03 07:37:31

2009”). You have the power to add more messages through these methods.

The messages have differing levels of severity, hence different methods for

AddMessage and AddError. Sometimes people choose to view only the


errors instead of all the messages, or they do a quick visual scan of the

messages for errors.

When you use arcpy.GetMessages(), you get all the messages generated by

the tool itself. These will tell you things such as whether the user entered

invalid parameters. Notice in this script the somewhat complex syntax you

have to use to first get the messages, then add them:

arcpy.AddMessage(arcpy.GetMessages()). If this line of code is

confusing to understand, remember that the order of functions works like

math operations: you start by working inside the parentheses first to get the

messages, then you add them.

The AddError and AddMessage methods are only used when making script

tools (which you'll learn about in the very next section). When you are just

running a script in PythonWin (not making a script tool), you can still get

the messages using a print statement with GetMessages(), like this: print

arcpy.GetMessages().

1.7.1 Making a script tool


User input variables that you retrieve through GetParameterAsText() make your
script very easy to convert into a tool in ArcGIS. A few people know how to alter
Python code, a few more can run a Python script and supply user input variables,
but almost all ArcGIS users know how to open ArcToolbox and run a tool. To finish
off this lesson, we’ll take the previous script and make it into a tool that can easily
be run in ArcGIS.

Before you begin this exercise, I strongly recommend that you read the first four
topics in the ArcGIS Desktop Help section Creating script tools with Python scripts
[13]. You likely will not understand all the parts of this section yet, but it will give
you some familiarity with script tools that will be helpful during the exercise.

Follow these steps to make a script tool:

1. Copy the code from Lesson 1.6.4 "Example: Creating Buffers" into a new

PythonWin script and save it as buffer_user_input.py.

2. Open ArcMap and display the Catalog window.

3. Expand the nodes Toolboxes > My Toolboxes.

4. Right-click My Toolboxes and click New > Toolbox.

5. Give your toolbox a name, such as "MyScriptTools".

6. Right-click your new toolbox and click Add > Script.

7. Fill in the Name, Label, and Description properties for your Script tool

as shown below:
Figure 1.10 Entering information for your script tool.

8. Click Next and supply the Script File. To do this, click the folder icon and

browse to your buffer_user_input.py file.

9. Click Next and examine the dialog that appears. This is where you can

specify the parameters of your script. The parameters are the values for

which you used arcpy.GetParameterAsText() in your script, namely inPath,

outPath, and bufferMiles. You will use this dialog to list those parameters in

the same order, except you can give the parameters names that are easier to

understand.
10. In the Display Name column that you see at the top of this wizard, click

the first empty cell and type “Input Feature Class”.

11. Immediately to the right, click the first empty cell in the Data Type column

and choose Feature Class. Here is one of the huge advantages of making a

script tool. Instead of accepting any string as input (which could contain an

error), your tool will now enforce the requirement that a feature class be

used as input. ArcGIS will help you by confirming that the value entered is a

path to a valid feature class. It will even supply the users of your tool with a

browse button so they can browse to the feature class.

Figure 1.11 Choosing "Feature Class."

12. Just as you did in the previous steps, add a second parameter named

“Output Feature Class”. The data type should again be Feature Class.

13. With the Output Feature Class parameter still highlighted, look down at

the Parameter Properties portion of the dialog. Change the Direction

property to Output.

14. Add a third property named “Buffer Distance”. Choose Linear Unit as the

data type. This data type will allow the user of the tool to select both the

distance value and the units (for example, miles, kilometers, etc.).
15. With the Buffer Distance parameter still highlighted, look down at the

Parameter Properties section again. Set the Default property to “5

Miles” (do not include the quotes). Your dialog should look like what you see

below:

Figure 1.12 Setting the Defailt property to "5 Miles."

16. Click Finish and, in the Catalog window, open your new script tool by

double-clicking it.

17. Try out your tool by buffering any feature class on your computer. Notice

that once you supply the input feature class, an output feature class path is
suggested for you. This is because you specifically set Output Feature Class

as an output parameter. Also, when the tool is complete, examine the

Results window for the custom message "All done!" that you added in your

code.

Figure 1.13 The tool is complete.

This is a very simple example and obviously you could just run the out-of-the-box
Buffer tool with similar results. Normally when you create a script tool, it will be
backed with a script that runs a combination of tools and applies some logic that
makes those tools uniquely useful.

There’s another benefit to this example, though. Notice the simplicity of our script
tool dialog compared to the main Buffer tool:
Figure 1.14 Comparison of our script tool with the main buffer tool.

At some point you may need to design a set of tools for beginning GIS users where
only the most necessary parameters are exposed. You may also do this to enforce
quality control if you know that some of the parameters must always be set to
certain defaults and you want to avoid the scenario where a beginning user (or a
rogue user) might change the required values. A simple script tool is effective for
simplifying the tool dialog in this way.

Readings

Read Zandbergen 2.10 - 2.13 to reinforce what you learned during this lesson about
scripts and script tools.

Lesson 1 Practice Exercises


Each lesson in this course includes some simple practice exercises with Python.
These are not submitted or graded, but they are highly recommended if you are
new to programming or if the project initially looks challenging. Lessons 1 and 2
contain shorter exercises, while Lessons 3 and 4 contain longer, more holistic
exercises. Each practice exercise has an accompanying solution that you should
carefully study.
Remember to choose File > New in PythonWin to create a new script (or click the
empty page icon). You can name the scripts something like Practice1, Practice2, etc.
To execute a script in PythonWin, click the "running man" icon.

1. Say hello

Create a string variable called x and assign it the value "Hello". Display the

contents of the x variable in the Interactive Window.

Practice 1 Solution [14]

2. Concatenate two strings

Create a string variable called first and assign to it your first name. Likewise,

create a string variable called last and assign to it your last name.

Concatenate (merge) the two strings together, making sure to also include a

space between them.

Practice 2 Solution [15]

3. Pass a value to a script as a parameter

Example 1.6.4 shows the use of the arcpy.GetParameterAsText() method.

This method is typically used in conjunction with an ArcGIS script tool that

has been designed to prompt the user to enter the required parameters.

However, you may have noticed that the little dialog that appears after

clicking the Run button in PythonWin also includes a place to supply

arguments.

For this exercise, write a script that accepts a single string value using the

GetParameterAsText method. The value entered should be a name and that


name should be concatenated with the literal string "Hi, " and displayed in

the Interactive Window. Test the script from within PythonWin, entering a

name (in quotes) in the Arguments text box after clicking the Run button.

Practice 3 Solution [16]

4. Report the geometry type of a feature class

Example 1.6.2 demonstrates the use of the Describe method to report the

spatial reference of a feature class. The Describe method returns an object

that has a number of properties that can vary depending on what type of

object you Described. A feature class has a spatialReference property by

virtue of the fact that it is a type of Dataset. (Rasters are another type of

Dataset and also have a spatialReference property.)

The Describe method's page in the Help [17] lists the types of objects that

the method can be used on. Clicking the Dataset link pulls up a list of the

properties available when you Describe a dataset, spatialReference being

just one of several.

For this exercise, use the Describe method again; this time, to determine the

type of geometry (point, polyline or polygon) stored in a feature class. I

won't tell you the name of the property that returns this information. But I

will give you the hint that feature classes have this mystery property not

because they're a type of Dataset as with the spatialReference property, but

because they're objects of the type FeatureClass.


Practice 4 Solution [18]

Project 1, Part I: Modeling


precipitation zones in Nebraska
Suppose you're working on a project for the Nebraska Department of Agriculture
and you are tasked with making some maps of precipitation in the state. Members
of the department want to see which parts of the state were relatively dry and wet
in the past year, classified in zones. All you have is a series of weather station
readings of cumulative rainfall for 2008 that you've obtained from within Nebraska
and surrounding areas. This is a shapefile of points called
Precip2008Readings.shp. It is in your Lesson 1 data folder.

Precip2008Readings.shp is a fictional dataset created for this project. The


locations do not correspond to actual weather stations. However, the
measurements are derived from real 2008 precipitation data created by the
PRISM Climate Group [19] at Oregon State University, 2009.

You need to do several tasks in order to get this data ready for mapping:

 Interpolate a precipitation surface from your points. This creates a raster

dataset with estimated precipitation values for your entire area of interest.

You've already planned for this, knowing that you are going to use inverse

distance weighted (IDW) interpolation. Click the following link to learn how

the IDW technique works. [20] You've also selected your points to include

some areas around Nebraska to avoid edge effects in the interpolation.

 Reclassify the interpolated surface into an ordinal classification of

precipitation "zones" that delineate relatively dry, medium, and wet regions.

 Create vector polygons from the zones.

 Clip the zone polygons to the boundary of Nebraska.


Figure 1.15 Mapping the data.

It's very possible that you'll want to repeat the above process in order to test
different IDW interpolation parameters or make similar maps with other datasets
(such as next year's precipitation data). Therefore, the above series of tasks is well-
suited to ModelBuilder. Your job is to create a model that can complete the
above series of steps without you having to manually open four
different tools.

Model parameters

Your model should have these (and only these) parameters:


1. Input precipitation readings- This is the location of your precipitation

readings point data. This is a model parameter so that the model can be

easily re-run with other datasets.

2. Power- An IDW setting specifying how quickly influence of surrounding

points decreases as you move away from the point to be interpolated.

3. Search radius- An IDW setting determining how many surrounding

points are included in the interpolation of a point. The search radius can be

fixed at a certain distance, including whatever number of points happen to

fall within, or its distance can vary in order for it to always include a

minimum number of points. When you use ModelBuilder, you don't have to

set up any of these choices; ModelBuilder does it for you when you set the

Search Radius as a model parameter.

4. Zone boundaries- This is a table allowing the user of the model to specify

the zone boundaries. For example, you could configure precipitation values

of 0 - 30000 to result in a reclassification of 1 (to correspond with Zone 1),

30000 - 60000 could result in a classification of 2 (to correspond with Zone

2), and so on. The way to get this table is to make a variable from the

Reclassification parameter of the Reclassify tool and set it as a model

parameter.

5. Output precipitation zones- This is the location where you want the

output dataset of clipped vector zones to be placed on disk.

As you build your model, you will need to configure some settings that will not be
exposed as parameters. These include the clip feature, which is the state of
Nebraska outline Nebraska.shp in your Lesson 1 data folder. There are many
other settings such as "Z Value field" and "Input barrier polyline features" (for
IDW) or "Reclass field" (for Reclassify) that should not be exposed as parameters.
You should just set these values once when you build your model. If you ever ask
someone else to run this model, you don't want them to be overwhelmed with
choices stemming from every tool in the model; you should just expose the
essential things they might want to change.

For this particular model, you should assume that any input dataset will conform to
the same schema as your Precip2008Readings.shp feature class. For example, an
analyst should be able to submit a similar Precip2009Readings dataset with the
same fields, field names, and data types. However, he or she should not expect to
provide any feature class with a different set of fields and field names, etc. As you
might discover, handling all types of feature class schemas would make your model
more complex than we want for this assignment.

When you double-click the model to run it, the interface should look like the
following:

Figure 1.16 The model interface.


Running the model with the exact parameters listed above should result in the
following (I have symbolized the zones in ArcMap with different colors to help
distinguish them). This is one way you can check your work:

Figure 1.17 The completed model output.

Deliverables

The deliverables for this project are:

 The .tbx file of the toolbox containing your model. The easiest way to find it

is to right-click your toolbox in the Catalog window, click Properties, and

note the Location. If you can't browse to this path in Windows Explorer,

you'll need to enable the Windows option to show hidden files and folders.

 A screen capture of the model interface before you run the model (it should

look a lot like the above image, although you can set your own

reclassification values, power, etc.)

 A screen capture of your the model result in ArcMap, with zones symbolized

in different colors. You don't have to use the Layout view for this project.

Successful delivery of the above requirements is sufficient to earn 90% on the


project. The remaining 10% is reserved for efforts that go "over and above" the
minimum requirements. This could include (but is not limited to) meaningful
labels on and around model elements, analysis of how different input values affect
the output, substitution of some other interpolation method instead of IDW (for
example Kriging), documentation for your model parameters that appears in the
side-panel help, or demonstration of how your model was successfully run on a
different input dataset.

Tips

The following tips may help you as you build your model:

 Your model needs to include the following tools in this order: IDW (from

the Spatial Analyst toolbox), Reclassify, Raster to Polygon, Clip

(from the Analysis toolbox).

 An easy way to find the tools you need in ArcMap is to click Windows >

Search and type the name of the tool you want in the search box. Be careful

when multiple tools have the same name. You'll typically be using tools from

the Spatial Analyst toolbox in this assignment.

 Once you drag and drop a tool onto the ModelBuilder canvas, double-click it

and set all the parameters the way you want. These will be the default

settings for your model.

 If there is a certain parameter for a tool that you want to expose as a model

parameter, right-click the tool in the ModelBuilder canvas, then click Make

Variable > From Parameter and choose the parameter. Once the oval

appears for the variable, right-click it and click Model Parameter.

 If you receive errors that a tool is not able to run, or that no Spatial Analyst

Extension is installed, you may need to enable the extension. In ArcMap,

click Customize > Extensions and then check the Spatial Analyst

checkbox.
Project 1, Part II: Creating contours
for the Fox Lake DEM
The second part of Project 1 will help you get some practice with Python. At the end
of Lesson 1, you saw three simple scripting examples; now your task is to write your
own script. This script will create vector contour lines from a raster elevation
dataset. Don't forget that the ArcGIS Desktop Help [21] can indeed be helpful if you
need to figure out the syntax for a particular command.

Earlier in the lesson you were introduced to the Fox Lake DEM in your Lesson 1
data folder. It represents elevation in the Fox Lake Quadrangle, Utah. Write a
script that uses the Contour tool in the Spatial Analyst toolbox to create contour
lines for the quadrangle. The contour interval should be 25 meters and the base
contour should be 0. Remember that the native units of the DEM are meters, so no
unit conversions are required.

Running the script should immediately create a shapefile of contour lines on disk.

Follow these guidelines when writing the script:

 The purpose of this exercise is just to get you some practice writing Python

code. Therefore, you are not required to use arcpy.GetParameterAsText() to

get the input parameters. Go ahead and hard-code the values (such as the

path name to the dataset).

 Consequently, you are not required to create a script tool for this exercise.

This will be required in Project 2.

 Your code should run correctly from PythonWin. For full credit, it should

also contain comments, attempt to handle errors, and use legal and intuitive

variable names.

Deliverables

The deliverables for Project 1, Part II are:

 The .py file containing your script.


 A short writeup (about 300 words) describing what you learned during this

project and how you approached the problem. These writeups will be

required on all projects.

Finishing Lesson 1

To complete Lesson 1, please zip all your Project 1 deliverables (for parts I and II)
into one file and submit them to the Lesson 1 Drop Box in ANGEL. Then take the
Lesson 1 Quiz if you haven't taken it already.

Author(s) and/or Instructor(s): Sterling Quinn, John A. Dutton e-Education


Institute, College of Earth and Mineral Sciences, The Pennsylvania State
University;
Jim Detwiler, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University;
Frank Hardisty, John A. Dutton e-Education Institute, College of Earth and
Mineral Sciences, The Pennsylvania State University;
James O'Brien, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University

Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program

© 1999-2012 The Pennsylvania State University. Except where otherwise noted,


this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.

Please address questions and comments about this resource to the site editor.

Source URL: https://www.e-education.psu.edu/geog485/node/17

Links:
[1]
http://training.esri.com/acb2000/showdetl.cfm?DID=6&amp;Product_ID=971
[2] https://www.e-education.psu.edu/drupal6/files/geog485py/data/Lesson1.zip
[3] http://webhelp.esri.com
[4]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/A_quick_tour_o
f_managing_intermediate_data/002w0000000z000000/
[5]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0021/002100000037000000
.htm
[6]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Setting_Current
_and_Scratch_Workspace_environments/002w00000037000000/
[7]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00570000000
q000000.htm
[8]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002w/002w0000001w00000
0.htm
[9]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00080000001
9000000.htm
[10] http://www.python.org
[11] http://webhelp.esri.com/arcgisdesktop/9.3/pdf/Geoprocessor_93.pdf
[12]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00p6/00p60000000r00000
0.htm
[13]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0015/001500000006000000
.htm
[14] https://www.e-education.psu.edu/geog485/../L01_Prac1.html
[15] https://www.e-education.psu.edu/geog485/../L01_Prac2.html
[16] https://www.e-education.psu.edu/geog485/../L01_Prac3.html
[17]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000002600000
0.htm
[18] https://www.e-education.psu.edu/geog485/../L01_Prac4.html
[19] http://www.prismclimate.org
[20]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_IDW_wor
ks/009z00000075000000/
[21] http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html
Lesson 2: Python and programming
basics
In Lesson 1 you received an introduction to Python. Lesson 2 builds on that
experience, diving into Python fundamentals. Many of the things you'll learn are
common to programming in other languages. If you already have coding
experience, this lesson may contain some review.

This lesson has a relatively large amount of reading from the course materials, the
Zandbergen text, and the ArcGIS help. I believe you will get a better understanding
of the Python concepts as they are explained and demonstrated from several
different perspectives. Whenever the examples use the Interactive Window, I
strongly suggest that you type in the code yourself as you follow the examples. This
can take some time, but you'll be amazed at how much more information you retain
if you try the examples yourself instead of just reading them.

At the end of the lesson you'll be required to write a Python script that puts
together many of the things you've learned. This will go much faster if you've taken
the time to read all the required text and work through the examples.

Lesson 2 checklist
Lesson 2 covers Python fundamentals (many of which are common to other
programming languages) and gives you a chance to practice these in a project. To
complete Lesson 2, you are required to do the following:

1. Download the Lesson 2 data [1] and extract it to

C:\WCGIS\Geog485\Lesson2.

2. Work through the online sections of the lesson.

3. Read Zandbergen chapters 4 - 6, 11.1 - 11.5, and 11.11. In the online lesson

pages I have inserted instructions about when it is most appropriate to read

each of these chapters. There is more reading this lesson than in a typical

week. If you are new to Python, please plan some extra time to read these

chapters. There are also some readings this week from the ArcGIS Help.
4. Complete Project 2 and upload its deliverables to the Lesson 2 drop box. The

deliverables are listed in the Project 2 description page.

5. Complete the Lesson 2 Quiz.

2.1 More Python fundamentals


At this point you've learned most of what you need to know about ModelBuilder,
and this may be enough to address many of the GIS tasks that you face in your
work. However, as useful as ModelBuilder is, you'll find that sometimes you need
Python to build extra intelligence into your geoprocessing. For example, you may
need to construct complex query strings, or employ conditional logic. You may
need to read, or parse varying types of user input before you can send it to a tool as
a parameter. Or you might need to do complex looping that, at some threshold,
probably becomes easier to write in Python than to figure out with ModelBuilder.

In Lesson 1, you saw your first Python scripts and were introduced to the basics,
such as importing modules, using arcpy, working with properties and methods, and
indenting your code in try/catch blocks. In the following sections, you'll learn about
more Python programming fundamentals such as working with lists, looping,
if/then decision structures, manipulating strings, and casting variables.

Although this might not be the most thrilling section of the course, it's probably the
most important section for you to spend time understanding and experimenting
with on your own, especially if you are new to programming.

Programming is similar to playing sports: if you take time to practice the


fundamentals, you'll have an easier time when you need to put all your skills
together. For example, think about the things you need to learn in order to play
basketball. A disciplined basketball player practices dribbling, passing, long-range
shooting, layup shots, free throws, defense, and other skills. If you practice each of
these fundamentals well individually, you'll be able to put them together when it's
time to play a full game.

Learning a programming language is the same way. When faced with a problem,
you'll be forced to draw on your fundamental skills to come up with a workable
plan. You may need to include a loop in your program, store items in a list, or make
the program do one of four different things based on certain user input. If you
know how to do each of these things individually, you'll be able to fit the pieces
together, even if the required task seems daunting.

Take time to make sure you understand what's happening in each line of the code
examples, and if you run into a question, please jot it down and post to the forums.
2.1.1 Lists
In Lesson 1 you learned about some common data types in Python, such as strings
and integers. Sometimes you need a type that can store multiple related values
together. Python offers several ways of doing this, and the first one we'll learn
about is the list.

Here's a simple example of a list. You can type this in the PythonWin Interactive
Window to follow along:

>>>suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']

This list named 'suits' stores four related string values representing the suits in a
deck of cards. In many programming languages, storing a group of objects in
sequence like this is done with arrays. While the Python list could be thought of as
an array, it's a little more flexible than the typical array in other programming
languages. This is because you're allowed to put multiple data types into one list.

For example, suppose we wanted to make a list for the card values you could draw.
The list might look like this:

>>>values = ['Ace', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King']

Notice that you just mixed string and integer values in the list. Python doesn't care.
However, each item in the list still has an index, meaning an integer that denotes
each item's place in the list. The list starts with index 0 and for each item in the list,
the index increments by one. Try this:

>>>print suits[0]
Spades
>>>print values[12]
King

In the above lines, you just requested the item with index 0 in the suits list and got
'Spades'. Similarly, you requested the item with index 12 in the values list and got
'King'.

It may take some practice initially to remember that your lists start with a 0 index.
Testing your scripts can help you avoid off-by-one errors that might result from
forgetting that lists are zero-indexed. For example, you might set up a script to
draw 100 random cards and print the values. If none of them is an Ace, you've
probably stacked the deck against yourself by making the indices begin at 1.

Remember you learned that everything is an object in Python? That applies to lists
too. In fact, lists have a lot of useful methods that you can use to change the order
of the items, insert items, sort the list, and so on. Try this:
>>> suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']
>>> suits.sort()
>>> print suits
['Clubs', 'Diamonds', 'Hearts', 'Spades']

Notice that the items in the list are now in alphabetical order. You may have also
noticed that when you typed "suits" you got some help to see what methods were
available for the list. This is called autocompletion, and it can be a great help when
you're writing code. You want to understand what you can do and cannot do. It can
also be a way of avoiding typing errors because you can arrow down and press Tab
to insert the method you want. The autocompletion is a feature of PythonWin, but
this type of help can be found in other integrated development environments
(IDEs) like Microsoft Visual Studio. Microsoft has branded their version of
autocompletion, "IntelliSense," and this name has been catchy enough that you
may hear people using it conversationally even when using non-Microsoft IDEs.

The sort() method you used above allowed you to do something in one line of code
that would have otherwise taken many lines. Another helpful method like this is
reverse(), which allows you to sort a list in reverse alphabetical order:

>>> suits.reverse()
>>> print suits
['Spades', 'Hearts', 'Diamonds', 'Clubs']

Before you attempt to write list-manipulation code, check your textbook or the
Python list reference documentation [2] to see if there's an existing method that
might simplify your work.

Inserting items and combining lists

What happens when you want to combine two lists? Type this in the Interactive
Window:

>>> listOne = [101,102,103]


>>> listTwo = [104,105,106]
>>> listThree = listOne + listTwo
>>> print listThree
[101, 102, 103, 104, 105, 106]

Notice that you did not get [205,207,209]; rather, Python treats the addition as
appending listTwo to list One. Next, try these other ways of adding items to the list:

>>> listThree += [107]


>>> print listThree
[101, 102, 103, 104, 105, 106, 107]
>>> listThree.append(108)
>>> print listThree
[101, 102, 103, 104, 105, 106, 107, 108]
To put an item at the end of the list, you can either add a one-item list (how we
added 107 to the list) or use the append() method on the list (how we added 108 to
the list). Notice that listThree += [107] is a shortened form of saying listThree =
listThree + [107].

If you need to insert some items in the middle of the list, you can use the insert()
method:

>>> listThree.insert(4, 999)


>>> print listThree
[101, 102, 103, 104, 999, 105, 106, 107, 108]

Notice that the insert() method above took two parameters. You might have even
noticed a tooltip that shows you what the parameters mean.

The first parameter is the index position that the new item will take. This method
call inserts 999 between 104 and 105. Now 999 is at index 4.

Getting the length of a list

Sometimes you'll need to find out how many items are in a list, particularly when
looping. Here's how you can get the length of a list:

>>> myList = [4,9,12,3,56,133,27,3]


>>> print len(myList)
8

Notice that len() gives you the exact number of items in the list. To get the index of
the final item, you would need to use len(myList) - 1. Again, this distinction can
lead to off-by-one errors if you're not careful.

Other ways to store collections of data

Lists are not the only way to store ordered collections of items in Python; you can
also use tuples and dictionaries. Tuples are like lists, but you can't change the
objects inside a tuple over time. In some cases a tuple might actually be a better
structure for storing values like the suits in a deck of cards, because this is a fixed
list that you wouldn't want your program to change by accident.

Dictionaries differ from lists in that items are not indexed; instead, each item is
stored with a key value which can be used to retrieve the item. We'll use
dictionaries later in the course, and your reading assignment for this lesson covers
dictionary basics. The best way to understand how dictionaries work is to play with
some of the textbook examples in the Interactive Window (see Zandbergen 6.8).

2.1.2 Loops
A loop is a section of code that repeats an action. Remember, the power of scripting
(and computing in general) is the ability to quickly repeat a task that might be
time-consuming or error-prone for a human. Looping is how you repeat tasks with
code; whether its reading a file, searching for a value, or performing the same
action on each item in a list.

for Loop

A for loop does something with each item in a list. Type this in the PythonWin
Interactive Window to see how a simple for loop works:

>>> for name in ["Carter", "Reagan", "Bush"]:


print name + " was a U.S. president."

After typing this, you'll have to hit Enter twice in a row to tell PythonWin that you
are done working on the loop and that the loop should be executed. You should see:

Carter was a U.S. president


Reagan was a U.S. president
Bush was a U.S. president

Notice a couple of important things about the loop above. First, you declared a new
variable, "name," to represent each item in the list as you iterated through. This is
okay to do; in fact, it's expected that you'll do this at the beginning of the for loop.

The second thing to notice is that after the condition, or the first line of the loop,
you typed a colon (:), then started indenting subsequent lines. Some programming
languages require you to type some kind of special line or character at the end of
the loop (for example, "Next" in Visual Basic, or "}" in JavaScript), but Python just
looks for the place where you stop indenting. By pressing Enter twice, you told
Python to stop indenting and that you were ready to run the loop.

for Loops can also work with lists of numbers. Try this one in the Interactive
Window:

>>> x = 2
>>> multipliers = [1,2,3,4]
>>> for num in multipliers:
print x * num

2
4
6
8

In the loop above, you multiplied each item in the list by 2. Notice that you can set
up your list before you start coding the loop.

You could have also done the following with the same result:
>>> multipliers = [1,2,3,4]
>>> for num in multipliers:
x = 2
print x * num

The above code, however, is less efficient than what we did initially. Can you see
why? This time you are declaring and setting the variable x = 2 inside the loop. The
Python interpreter will now have to read and execute that line of code four times
instead of one. You might think this is a trivial amount of work, but if your list
contained thousands or millions of items the difference in execution time would
become noticeable. Declaring and setting variables outside a loop, whenever
possible, is a best practice in programming.

While we're on the subject, what would you do if you wanted to multiply 2 by every
number from 1 to 1000? It would definitely be too much typing to manually set up
a multipliers list as in the previous example. In this case, you can use Python's
built-in range function. Try this:

>>> x = 2
>>> for num in range(1,1001):
print x * num

The range function is your way of telling Python, "Start here and stop there." We
used 1001 because the loop stops one item before the function's second argument
(the arguments are the values you put in parentheses to tell the function how to
run). If you need the function to multiply by 0 at the beginning as well, you could
even get away with using one argument:

>>> x = 2
>>> for num in range(1001):
print x * num

The range function has many interesting uses which are detailed in this section's
reading assignment in Lutz.

while Loops

A while loop executes until some condition is met. Here's how to code our example
above using a while loop:

>>> x = 0
>>> while x < 1001:
... print x * 2
... x += 1

while loops often involve the use of some counter that keeps track of how many
times the loop has run. Sometimes you'll perform operations with the counter. For
example, in the above loop, x was the counter, and we also multiplied the counter
by 2 each time during the loop. To increment the counter we used x += 1 which is
shorthand for x = x + 1, or "add one to x".

Nesting loops

Some situations call for putting one loop inside another, a practice called nesting.
Nested loops could help you print every card in a deck (minus the Jokers):

>>> suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']


>>> values = ['Ace', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King']
>>> for suit in suits:
for value in values:
print str(value) + " of " + str(suit)

In the above example you start with a suit, then loop through each value in the suit,
printing out the card name. When you've reached the end of the list of values, you
jump out of the nested loop and go back to the first loop to get the next suit. Then
you loop through all values in the second suit and print the card names. This
process continues until all the suits and values have been looped through.

Looping in GIS models

You will use looping repeatedly (makes sense!) as you write GIS scripts in Python.
Often you'll need to iterate through every row in a table, every field in a table, or
every feature class in a folder or a geodatabase. You might even need to loop
through the vertices of a geographic feature.

You saw above that loops work particularly well with lists. arcpy has some methods
that can help you create lists. Here's an example you can try that uses
arcpy.ListFeatureClasses(). First, manually create a new folder
C:\WCGIS\Geog485\Lesson2\PracticeData. Then copy the code below into a new
script in PythonWin and run the script. The script copies all the data in your
Lesson1 folder into the new Lesson2\PracticeData folder you just created.

# Copies all feature classes from one folder to another


import arcpy

try:
arcpy.env.workspace = "C:/WCGIS/Geog485/Lesson1"

# List the feature classes in the Lesson 1 folder


fcList = arcpy.ListFeatureClasses()

# Loop through the list and copy the feature classes to the Lesson 2
PracticeData folder
for featureClass in fcList:
arcpy.CopyFeatures_management(featureClass,
"C:/WCGIS/Geog485/Lesson2/PracticeData/" + featureClass)

except:
print "Script failed to complete"
print arcpy.GetMessages(2)

Notice above that once you have a Python list of feature classes (fcList), it's very
easy to set up the loop condition (for featureClass in fcList:).

Another common operation in GIS scripts is looping through tables. In fact, the
arcpy module contains some special objects called cursors that help you do this.
Here's a short script showing how a cursor can loop through each row in a feature
class and print the name. We'll cover cursors in detail in the next lesson, so don't
worry if some of this code looks confusing right now. The important thing is to
notice how a loop is used to iterate through each record:

import arcpy
inTable = "C:/WCGIS/Geog485/Lesson2/CityBoundaries.shp"
inField = "NAME"

rows = arcpy.SearchCursor(inTable)

#This loop goes through each row in the table


# and gets a requested field value

for row in rows:


currentCity = row.getValue(inField)
print currentCity

In the above example, a search cursor named rows retrieves records from the table.
The for loop makes it possible to perform an action on each individual record.

ArcGIS Help reading

Read the following in the ArcGIS Desktop Help:

 Listing data [3]

2.1.3 Decision structures


Many scripts that you write will need to have conditional logic that executes a block
of code given a condition and perhaps executes a different block of code given a
different condition. The "if." "elif," and "else" statements in Python provide this
conditional logic. Try typing this example in the PythonWin Interactive Window:

>>> x = 3
>>> if x > 2:
... print "Greater than two"
...
Greater than two
In the above example, the keyword "if" denotes that some conditional test is about
to follow. In this case, the condition of x being greater than two was met, so the
script printed "Greater than two." Notice that you are required to put a colon (:)
after the condition and indent any code executing because of the condition. For
consistency in this class, all indentation is done using four spaces.

Using "else" is a way to run code if the condition isn't met. Try this:

>>> x = 1
>>> if x > 2:
... print "Greater than two"
... else:
... print "Less than or equal to two"
...
Less than or equal to two

Notice that you don't have to put any condition after "else." It's a way of catching all
other cases. Again the conditional code is indented four spaces, which makes the
code very easy for a human to scan. The indentation is required because Python
doesn't require any type of "end if" statement (like many other languages) to
denote the end of the code you want to execute.

If you want to run through multiple conditions, you can use "elif", which Python's
abbreviation for "else if":

>>>x = 2
>>> if x > 2:
... print "Greater than two"
... elif x == 2:
... print "Equal to two"
... else:
... print "Less than two"
...
Equal to two

In the code above, "elif x == 2:" tests whether x is equal to two. The == is a way to
test whether two values are equal. Using a single = in this case would result in an
error because = is used to assign values to variables. In the code above, you're not
trying to assign x the value of 2, you want to check if x is already equal to 2, hence
you use ==.

Caution: Using = instead of == to check for equivalency is a very common Python


mistake, especially if you've used other languages where = is allowed for
equivalency checks.

You can also use if, elif, and else to handle multiple possibilities in a set. The code
below picks a random school from a list (notice we had to import the random
module to do this and call a special method random.randrange()). After the school
is selected and its name is printed, a series of if/elif/else statements appears that
handles each possibility. Notice that the else statement is left in as an error
handler; you should not run into that line if your code works properly, but you can
leave the line in there to fail gracefully if something goes wrong.

import random

# Choose a random school from a list and print it


schools = ["Penn State", "Michigan", "Ohio State", "Indiana"]
randomSchoolIndex = random.randrange(0,4)
chosenSchool = schools[randomSchoolIndex]
print chosenSchool

# Depending on the school, print the mascot


if chosenSchool == "Penn State":
print "You're a Nittany Lion"
elif chosenSchool == "Michigan":
print "You're a Wolverine"
elif chosenSchool == "Ohio State":
print "You're a Buckeye"
elif chosenSchool == "Indiana":
print "You're a Hoosier"
else:
print "This program has an error"

Some other programming languages have special keywords for doing the above,
such as switch or select case. In Python, however, it's usually just done with a long
list of "if"s and "elif"s.

2.1.4 String manipulation


You've previously learned how the string variable can contain numbers and letters
and represent almost anything. When using Python with ArcGIS, strings can be
useful for storing paths to data and printing messages to the user. There are also
some geoprocessing tool parameters that you'll need to supply with strings.

Python has some very useful string manipulation abilities. We won't get into all of
them in this course, but following are a few techniques that you need to know.

Concatenating strings

To concatenate two strings means to append or add one string on to the end of
another. For example, you could concatenate the strings "Python is " and "a
scripting language" to make the complete sentence "Python is a scripting
language." Since you are adding one string to another, it's intuitive that in Python
you can use the + sign to concatenate strings.

You may need to concatenate strings when working with path names. Sometimes
it's helpful or required to store one string representing the folder or geodatabase
from which you're pulling datasets and a second string representing the dataset
itself. You put both together to make a full path.
The following example, modified from one in the ArcGIS Help, demonstrates this
concept. Suppose you already have a list of strings representing feature classes that
you want to clip. The list is represented by "featureClasses" in this script:

# This script clips all datasets in a folder


import arcpy

inFolder = "c:\\data\\inputShapefiles\\"
resultsFolder = "c:\\data\\results\\"
clipFeature = "c:\\data\\states\\Nebraska.shp"

# List feature classes


arcpy.env.workspace = inFolder
featureClassList = arcpy.ListFeatureClasses()

# Loop through each feature class and clip


for featureClass in featureClassList:

# Make the output path by concatenating strings


outputPath = resultsFolder + featureClass
# Clip the feature class
arcpy.Clip_analysis(featureClass, clipFeature, outputPath)

String concatenation is occurring in this line: outputPath =


resultsFolder + featureClass. In longhand, the output folder
"c:\\data\\results\\" is getting the feature class name added on the end. If the
feature class name were "Roads.shp" the resulting output string would be
"c:\\data\\results\\Roads.shp".

The above example shows that string concatenation can be useful in looping.
Constructing the output path by using a set workspace or folder name followed by a
feature class name from a list gives much more flexibility than trying to create
output path strings for each dataset individually. You may not know how many
feature classes are in the list or what their names are. You can get around that if
you construct the output paths on the fly through string concatenation.

Casting to a string

Sometimes in programming you have a variable of one type that needs to be treated
as another type. For example, 5 can be represented as a number or as a string.
Python can only perform math on 5 if it is treated as a number, and it can only
concatenate 5 onto an existing string if it is treated as a string.

Casting is a way of forcing your program to think of a variable as a different type.


Create a new script in PythonWin and type or paste the following code:

x = 0
while x < 10:
print x
x += 1
print "You ran the loop " + x + " times."

Now try to run it. The script attempts to concatenate strings with the variable x to
print how many times you ran a loop, but it results in an error: "TypeError: cannot
concatenate 'str' and 'int' objects." Python doesn't have a problem when you want
to print the variable x on its own, but Python cannot mix strings and integer
variables in a printed statement. To get the code to work, you have to cast the
variable x to a string when you try to print it.

x = 0
while x < 10:
print x
x += 1

print "You ran the loop " + str(x) + " times."

You can force Python to think of x as a string by using str(x). Python has other
casting functions such as int() and float() that you can use if you need to go from a
string to a number. Use int() for integers and float() for decimals.

Readings

It's time to take a break and do some readings from another source. If you are new
to Python scripting this will help you see the concepts from a second angle.

Read Zandbergen chapters 4 - 6. This can take a few hours but it will save you
hours of time if you make sure you understand this material now.

 Chapter 4 covers the basics of Python syntax, loops, strings and other things

we just learned.

 Chapter 5 talks about working with arcpy and ArcGIS tools, which you

briefly tasted in Lesson 1.

 Chapter 6 gives some specific instructions about working with ArcGIS

datasets, which will be valuable during this week's assigned project.

If you still don't feel like you understand the material after reading the above
chapters, don't re-read it just yet. Try some coding from the Lesson 2 practice
exercises and assignments, then come back and re-read if necessary. If you are
really struggling with a particular concept, type the examples in the interactive
window. Programming is like a sport in the sense that you cannot learn all about it
by reading; at some point you have to get up and do it.
2.1.5 Putting it all together
In this section of the lesson you've learned the basic programming concepts of lists,
loops, decision structures, and string manipulation. You might be surprised at what
you can do with just these skills. In this section, we'll practice putting them all
together to address a scenario. This will give us an opportunity to talk about
strategies for approaching programming problems in general.

The scenario we'll tackle is to simulate a one-player game of Hasbro's children's


game "Hi Ho! Cherry-O." In this simple game of chance, you begin with 10 cherries
on a tree. You take a turn by spinning a random spinner which tells you whether
you get to add or remove cherries on the turn. The possible spinner results are:

 Remove 1 cherry

 Remove 2 cherries

 Remove 3 cherries

 Remove 4 cherries

 Bird visits your cherry bucket (Add 2 cherries)

 Dog visits your cherry bucket (Add 2 cherries)

 Spilled bucket (Place all 10 cherries back on your tree)

You continue taking turns until you have 0 cherries left on your tree, at which point
you have won the game. Your objective here is to write a script that simulates the
game, printing the following:

 The result of each spin

 The number of cherries on your tree after each turn. This must always be

between 0 and 10.

 The final number of turns needed to win the game

Approaching a programming problem

Although this example may seem juvenile, it's an excellent way to practice
everything you just learned. As a beginner, you may seem overwhelmed by the
above problem. A common question is, "Where do I start?" The best approach is to
break down the problem into smaller chunks of things you know how to do.

One of the most important programming skills you can acquire is the ability to
verbalize a problem and translate it into a series of small programming steps.
Here's a list of things you would need to do in this script. Programmers call this
pseudocode because it's not written in code, but it follows the sequence their code
will need to take.

1. Spin the spinner

2. Print the spin result

3. Add or remove cherries based on the result

4. Make sure the number of cherries is between 0 and 10

5. Print the number of cherries on the tree

6. Take another turn or print the number of turns it took to win the game

It also helps to list the variables you'll need to keep track of:

 Number of cherries currently on the tree (Starts at 10)

 Number of turns taken (Starts at 0)

 Value of the spinner (Random)

Let's try to address each of the pseudocode steps. Don't worry about the full flow of
the script yet. Rather, try to understand how each step of the problem should be
solved with code. Assembling the blocks of code at the end is relatively trivial.

Spin the spinner

How do you simulate a random spin? In one of our previous examples, we used the
random module to generate a random number within a range of integers; however,
the choices on this spinner are not linear. A good approach here is to store all spin
possibilities in a list and use the random number generator to pick the index for
one of the possibilities. On its own, the code would look like this:

import random
spinnerChoices = [-1, -2, -3, -4, 2, 2, 10]
spinIndex = random.randrange(0, 7)
spinResult = spinnerChoices[spinIndex]
The list spinnerChoices holds all possible mathematical results of a spin (remove 1
cherry, remove 2 cherries, etc.). The final value 10 represents the spilled bucket
(putting all cherries back on the tree).

You need to pick one random value out of this list to simulate a spin. The variable
spinIndex represents a random integer from 0 to 6 that is the index of the item
you'll pull out of the list. For example, if spinIndex turns out to be 2, your spin is -3
(remove 3 cherries from the tree). The spin is held in the variable spinResult.

The random.randrange() method is used to pick the random numbers. At the


beginning of your script, you have to import the random module in order to use
this method.

Print the spin result

Once you have a spin result, it only takes one line of code to print it. You'll have to
use the str() method to cast it to a string, though.

print "You spun " + str(spinResult) + "."

Add or remove cherries based on the result

As mentioned above, you need to have some variable to keep track of the number of
cherries on your tree. This is one of those variables that it helps to name intuitively:

cherriesOnTree = 10

After you complete a spin, you need to modify this variable based on the result.
Remember that the result is held in the variable spinResult and that a negative
spinResult removes cherries from your tree. So your code to modify the number of
cherries on the tree would look like:

cherriesOnTree += spinResult

Remember, the above is shorthand for saying cherriesOnTree = cherriesOnTree +


spinResult.

Make sure the number of cherries is between 0 and 10

If you win the game you have 0 cherries. You don't have to reach 0 exactly, but it
doesn't make sense to say that you have negative cherries. Similarly, you might spin
the spilled bucket, which for simplicity we represented with positive 10 in the
spinnerChoices. You are not allowed to have more than 10 cherries on the tree.

A simple if/elif decision structure can help you keep the cherriesOnTree within 0
and 10:
if cherriesOnTree > 10:
cherriesOnTree = 10
elif cherriesOnTree < 0:
cherriesOnTree = 0

This means, if you wound up with more than 10 cherries on the tree, set
cherriesOnTree back to 10. If you wound up with fewer than 0 cherries, set
cherriesOnTree to 0.

Print the number of cherries on the tree

All you have to do for this step is to print your cherriesOnTree variable, casting it to
a string so it can legally be inserted into a sentence.

print "You have " + str(cherriesOnTree) + "cherries on your tree."

Take another turn or print the number of turns it took to win the game

You probably anticipated that you would have to figure out a way to take multiple
turns. This is the perfect scenario for a loop.

What is the loop condition? There have to be some cherries left on the tree in order
to start another turn, so you could begin the loop this way:

while cherriesOnTree > 0:

Much of the code we wrote above would go inside the loop to simulate a turn. Since
we need to keep track of the number of turns taken, at the end of the loop we need
to increment a counter:

turns += 1

This turns variable would have to be initialized at the beginning of the script,
before the loop.

This code could print the number of turns at the end of the game:

print "It took you " + str(turns) + "turns to win the game."

Final code

Your only remaining task is to assemble the above pieces of code into a script.
Below is an example of how the final script would look. Copy this into a new
PythonWin script and try to run it:

# Simulates one game of Hi Ho! Cherry-O

import random
spinnerChoices = [-1, -2, -3, -4, 2, 2, 10]
turns = 0
cherriesOnTree = 10

# Take a turn as long as you have more than 0 cherries


while cherriesOnTree > 0:

# Spin the spinner


spinIndex = random.randrange(0, 7)
spinResult = spinnerChoices[spinIndex]

# Print the spin result


print "You spun " + str(spinResult) + "."

# Add or remove cherries based on the result


cherriesOnTree += spinResult

# Make sure the number of cherries is between 0 and 10


if cherriesOnTree > 10:
cherriesOnTree = 10
elif cherriesOnTree < 0:
cherriesOnTree = 0

# Print the number of cherries on the tree


print "You have " + str(cherriesOnTree) + " cherries on your tree."

turns += 1

# Print the number of turns it took to win the game


print "It took you " + str(turns) + " turns to win the game."
lastline = raw_input(">")

Analysis of the final code

Review the final code closely and consider the following things.

The first thing you do is import whatever supporting modules you need, in this case
it's the random module.

Next, you declare the variables that you'll use throughout the script. Each variable
has a scope, which determines how broadly it is used throughout the script. The
variables spinnerChoices, turns, and cherriesOnTree are needed through the entire
script, so they are declared at the beginning, outside the loop. Variables used
throughout your entire program like this have global scope. On the other hand, the
variables spinIndex and spinResult have local scope because they are used only
inside the loop. Each time the loop runs, these variables are re-initialized and their
values change.

You could potentially declare the variable spinnerChoices inside the loop and get
the same end result, but performance would be slower because the variable would
have to be re-initialized every time you ran the loop. When possible, you should
declare variables outside loops for this reason.
If you had declared the variables turns or cherriesOnTree inside the loop, your code
would have logical errors. You would essentially be starting the game anew on
every turn with 10 cherries on your tree, having taken 0 turns. In fact, you would
create an infinite loop because there is no way to remove 10 cherries during one
turn, and the loop condition would always evaluate to true. Again, be very careful
about where you declare your variables when the script contains loops.

Notice that the total number of turns is printed outside the loop once the game has
ended. The final line lastline = raw_input(">") gives you an empty cursor
prompting for input and is just a trick to make sure the application doesn't
disappear when it's finished (if you run the script from a command console).

Summary

In the above example, you saw how lists, loops, decision structures, and variable
casting can work together to help you solve a programming challenge. You also
learned how to approach a problem one piece at a time and assemble those pieces
into a working script. You'll have a chance to practice these concepts on your own
during this week's assignment. The next and final section of this lesson will provide
you with some sources of help if you get stuck.

Challenge activity

If the above activity made you enthusiastic about writing some code yourself, take
the above script and try to find the average number of turns it takes to win a game
of Hi-Ho! Cherry-O. To do this, add another loop that runs the game a large
number of times, say 10000. You'll need to record the total number of turns
required to win all the games, then divide by the number of games (use "/" for the
division). Send me your final result and I'll let you know if you've found the correct
average.

2.2 Troubleshooting and getting


help
If you find writing code to be a slow, mystifying, and painstaking process, fraught
with all kinds of opportunities to make mistakes, welcome to the world of a
programmer! Perhaps to their chagrin, programmers spend the majority of their
time hunting down and fixing bugs. Programmers also have to continually expand
and adapt their skills to work with new languages and technologies, which requires
research, practice, and lots of trial and error.

The best candidates for software engineering jobs are not the ones who list the
most languages or acronyms on their resumes. Instead, the most desirable
candidates are self-sufficient, meaning they know how to learn new things and find
answers to problems on their own. This doesn't mean that they never ask for help;
on the contrary, a good programmer knows when to stop banging his or her head
against the wall and consult peers or a supervisor for advice. However, most
everyday problems can be solved using the help documentation, online code
examples, online forums, existing code that works, programming books, and
debugging tools in the software.

Suppose you're in a job interview and your prospective employer asks, "What do
you do when you run into a 'brick wall' when programming? What sources do you
first go to for help?" If you answer, "My supervisor" or "My co-workers," this is a
red flag signifying that you could be a potential time sink to the development team.
Although the more difficult problems require group collaboration, a competitive
software development team cannot afford to hold an employee's hand through
every issue that he or she encounters. From the author's experience, many of the
most compelling candidates answer this question, "Google." They know that most
programming problems, although vexing, are common and the answer may be at
their fingertips in less than 30 seconds through a well-phrased Internet search.
Believe it or not, this can actually be faster than walking down the hall and asking a
co-worker, and it saves everybody time.

In this section of the lesson, you'll learn about places where you can go for help
when working with Python and when programming in general. You will have a
much easier experience in this course if you remember these resources and use
them as you complete your assignments.

2.2.1 Potential problems and quick


diagnosis
Don't be surprised if something goes wrong when you run your code the first time.
Debugging, or finding mistakes in code, is a part of life for programmers. Here are
some things that can happen:

 Your code doesn't run at all, usually because of a syntax error (you typed

some illegal Python code).

 Your code runs, but the script doesn't complete and reports an error.

 Your code runs, but the script never completes. Often this occurs when

you've created an infinite loop.


 Your code runs and the script completes, but it doesn't give you the expected

result. This is called a logical error and it is often the type of error that takes

the most effort to debug.

Don't be afraid of errors

Errors happen. There are very few programmers who can sit down and, off the top
of their heads, write dozens of lines of bug free code. This means a couple of things
for you:

 Expect to spend some time dealing with errors during the script-writing

process. Beginning programmers sometimes underestimate how much time

this takes. To get an initial estimate, you can take the amount of time it takes

to draft your lines of code, then double it or triple it to accommodate for

error handling and final polishing of your script and tool.

 Don't be afraid to run your script and hit the errors. A good strategy is to

write a small piece of functionality, run it to make sure its working, then add

on the next piece. It's less effective to write dozens of lines of code off the top

of your head before you ever run the script. Think of it this way: it's much

harder to find the errors in 50 new lines of code than in 10 new lines of code.

If you're building your script piece by piece and debugging often, you'll have

a better idea of where you introduced new errors.

Catching syntax errors

Syntax errors occur when you typed something incorrectly and your code refuses to
run. Common syntax errors include forgetting a colon when setting a loop or an if
condition, using single backslashes in a file name, providing the wrong number of
arguments to a function, or trying to mix variable types incorrectly, such as
dividing a number by a string.
When you try to run code with a syntax error in PythonWin, you may not notice
anything happen. At the bottom of the window, look for a message such as "Failed
to run script - syntax error - invalid syntax."

Sometimes the message is clearer. For example, if you indent a line only three
spaces instead of four, you get: "Failed to run script - syntax error - unexpected
indent."

You can check for syntax errors before you run your code using the Check button
on the PythonWin Standard toolbar. This button checks for errors and reports
them in a small message at the bottom of the window, just as you would see if you
tried to run your code with a syntax error. If there are no errors, you'll see a
message such as "Python and the TabNanny successfully checked the file
'myScript.py.'" (The TabNanny is a module that PythonWin uses to check for
correct indentation.)

Dealing with crashes

If your code crashes, you may see an error message in the Interactive Window or
the console. Instead of allowing your eyes to glaze over or banging your head
against the desk, you should rejoice at the fact that the software possibly reported
to you exactly what went wrong! Scour the message for clues as to what line of code
caused the error and what the problem was. Do this even if the message looks
intimidating. For example, see if you can understand what caused this error
message:

Traceback (most recent call last):


File "C:\Python25\Lib\site-
packages\pythonwin\pywin\framework\scriptutils.py", line 310, in
RunScript
exec codeObject in __main__.__dict__
File "C:\PSU_Python_Practice\syntax_error_practice.py", line 4, in
<module>
x = x / 0
ZeroDivisionError: integer division or modulo by zero

Although the message begins with some content you probably don't understand
and contains a typo ("modulo"), you can reasonably guess that the error was caused
in Line 4: x = x / 0. Dividing by 0 is not possible and the computer won't try to do
it.

It's easier to interpret messages like this if you've displayed line numbers for your
code in PythonWin. To get the line numbers:

1. In PythonWin click View > Options.

2. Click the Editor tab.


3. Set the Line Numbers property to a higher number such as 30.

The line numbers are also helpful if you make an e-mail or forum posting about
your code and include the script. You can immediately point out the line of the
crash to your colleagues. If you e-mail code to the instructors during this course
asking for help, be prepared to get a response pointing out specific line numbers
that need attention.

Ad-hoc debugging

Sometimes it's easy to sprinkle a few 'print' statements throughout your code to
figure out how far it got before it crashed, or what's happening to certain values in
your script as it runs. This can also be helpful to verify that your loops are doing
what you expect and that you are avoiding off-by-one errors.

Suppose you are trying to find the mean (average) value of the items in a list with
the code below.

#Find average of items in a list

list = [22,343,73,464,90]

for item in list:


total = 0
total += item

average = total / len(list)


print "Average is " + str(average)

The script reports "Average is 18," which doesn't look right. From a quick visual
check of this list you could guess that the average would be over 100. The script
isn't erroneously getting the number 18 from the list; it's not one of the values. So
where is it coming from? You can place a few strategic print statements in the
script to get a better report of what's going on:

#Find average of items in a list

list = [22,343,73,464,90]

for item in list:


print "Processing loop..."
total = 0
total += item
print total

print len(list)
average = total / len(list)
print "Performing division..."
print "Average is " + str(average)
Now when you run the script you see.

Processing loop...
22
Processing loop...
343
Processing loop...
73
Processing loop...
464
Processing loop...
90
5
Performing division...
Average is 18

The error now becomes more clear. The running total isn't being kept successfully;
instead, it's resetting each time the loop runs. This causes the last value, 90, to be
divided by 5, yielding an answer of 18. You need to initialize the variable for the
total outside the loop to prevent this from happening. After fixing the code and
removing the print statements, you get:

#Find average of items in a list

list = [22,343,73,464,90]
total = 0

for item in list:


total += item

average = total / len(list)


print "Average is " + str(average)

The resulting "Average is 198" looks a lot better. You've fixed a logical error in your
code: an error that doesn't make your script crash, but produces the wrong result.

Although debugging with print statements is quick and easy, you need to be careful
with it. Once you've fixed your code, you need to remember to remove the
statements in order to make your code faster and less cluttered. Also, adding print
statements becomes impractical for long or complex scripts. You can pinpoint
problems more quickly and keep track of many variables at a time using the
PythonWin debugger, which is covered in the next section of this lesson.

2.2.2 Using the PythonWin


debugger
Sometimes when other quick attempts at debugging fail, you need a way to take a
deeper look into your script. Most integrated development environments (IDEs)
like PythonWin include some debugging tools that allow you to step through your
script line-by-line to attempt to find an error. These tools allow you to keep an eye
on the value of all variables in your script to see how they react to each line of code.
The debugging toolbar can be a good way to catch logical errors where an offending
line of code is preventing your script from returning the correct outcome. The
debugging toolbar can also help you find which line of code is causing a crash.

The best way to explain the aspects of debugging is to work through an example.
This time we'll look at some code that tries to calculate the factorial of an integer
(the integer is hard-coded to 5 in this case). In mathematics, a factorial is the
product of an integer and all positive integers below it. Thus, 5! (or "5 factorial")
should be 5 * 4 * 3 * 2 * 1 = 120.

The code below attempts to calculate a factorial through a loop that increments the
multiplier by 1 until it reaches the original integer. This is a valid approach since 1 *
2 * 3 * 4 * 5 would also yield 120.

# This script calculates the factorial of a given


# integer, which is the product of the integer and
# all positive integers below it.

number = 5
multiplier = 1

while multiplier < number:


number *= multiplier
multiplier += 1

print number

Even if you can spot the error, follow along with the steps below to get a feel for the
debugging process and the PythonWin Debugging toolbar.

1. Open PythonWin and copy the above code into a new script.

2. Save your script as debugger_walkthrough.py. You can optionally run

the script, but you won't get a result and you may have to shut down

PythonWin in order to get back to where you were.

3. Click View > Toolbars and ensure Debugging is checked. Many IDEs

have debugging toolbars like this, and the tools they contain are pretty

standard: a way to run the code, a way to set breakpoints, a way to step

through the code line by line, and a way to watch the value of variables while

stepping through the code. We'll cover each of these in the steps below.
4. Set your cursor on the first line (number = 5) and click the Toggle

Breakpoint button . A breakpoint is a place where you want your code

to stop running so you can examine it line by line using the debugger. Often

you'll set a breakpoint deep in the middle of your script so you don't have to

examine every single line of code. In this example, the script is very short so

we're putting the breakpoint right at the beginning. The breakpoint is

represented by a circle next to the line of code and this is common in other

debuggers too.

5. Press the Go button . This runs your script up to the breakpoint. You

now have a small yellow arrow indicating which line of the script you are

about to run.

6. Click the Watch button . What's commonly known as a watch window

appears. This will help you track what happens to your variables as you

execute the code line by line. Before you run any more code, however, you

need to tell the watch window which variables to track.

7. In the Expression column of the watch window, double-click <New

Item> and type the name of your first variable "number" (omit the quotes).

In the Value column you'll see "NameError: name 'number' is not defined."

This makes sense because you haven't run the line of code yet that creates

this variable.

8. Similar to the previous step, click <New Item> again and set up a watch

for the "multiplier" variable. You should get the same error about the

variable not being defined yet.


9. Click the Step button . This executes one line of your code. Notice in

your watch window that the variable "number" now has a value of 5.

10. Click the Step button again. This time the "multiplier" variable has been

assigned a value.

11. Click the Step button a few more times to cycle through the loop. Go slowly

and use the watch window to understand the effect that each line has on the

two variables.

12. Step through the loop until "multiplier" reaches a value of 10. It should be
obvious at this point that the loop has not exited at the desired point. Our
intent was for it to quit when "number" reached 120.

Can you spot the error now? The fact that the loop has failed to exit should
draw your attention to the loop condition. The loop will only exit when
"multiplier" is greater than or equal to "number." That is obviously never
going to happen as "number" keeps getting bigger and bigger as it is
multiplied each time through the loop.

In this example, the code contained a logical error. It re-used the variable for
which we wanted to find the factorial (5) as a variable in the loop condition,
without considering that the number would be repeatedly increased within
the loop. Changing the loop condition to the following would cause the script
to work:

while multiplier < 5:

Even better than hard-coding the value 5 in this line would be to initialize a
variable early and set it equal to the number whose factorial we want to find.
The number could then get multiplied independent of the loop condition
variable.

13. Close PythonWin and re-open to a new script. Paste in the code below and
save the script as debugger_walkthrough2.py.
14. # This script calculates the factorial of a given

15. # integer, which is the product of the integer and

16. # all positive integers below it.

17.
18. number = 5

19. loopStop = number

20. multiplier = 1

21.

22. while multiplier < loopStop:

23. number *= multiplier

24. multiplier += 1

25.

print number

26. Display the Debugging toolbar and step through the loop a few times as

you did above. Watch the values of the "number" and "multiplier" variables,

but this time, also add a watch on the "loopStop" variable. This variable

allows the loop condition to remain constant while "number" is multiplied.

Indeed you should see "loopStop" remain fixed at 5 while "number"

increases to 120.

27. Keep stepping until "number" reaches 120 and you reach the "print number"

line. At this line, don't press the Step button; instead, just press Go to finish

out the script. (You don't want to step through all the internal Python code

required to print the variable.) At this point the value of "number" should be

120, which is 5 factorial. If you want, you can try substituting other integers

as the "number" value to find their factorials.

In the above example you used the Debugging toolbar to find a logical error that
had caused an endless loop in your code. Debugging tools are often your best
resource for hunting down subtle errors in your code.

If you reach an internal Python function such as print while using the Debugger,
the debugger will dive right into all the Python code needed to run the function.
You'll know when this happens because you'll see one or more windows open with
code that's difficult to understand. This is also the case sometimes when you run
arcpy functions.The problem is compounded because this type of code tends to call
other functions, which winds up opening many windows.

If you don't want to see all this code, you can try shortcutting around it by using the
Step Over button to jump over a complex function or Step Out to get out
of the function. If stepping over or through or out of all that code is too confusing,
you can set another breakpoint one or two lines beyond the line with the function
and just press the Go button again to run to that next breakpoint. When you press
the Go button, the debugger doesn't stop until it hits the next breakpoint.

You can and should practice using the Debugging toolbar in the script-writing
assignments that you receive in this course. You may save a lot of time this way. As
a teaching assistant in a university programming lab years ago, the author of this
course saw many students wait a long time to get one-on-one help, when a simple
walk through their code using the debugger would have revealed the problem.

Readings

Read Zandbergen 11.1 - 11.5 to get his tips for debugging. Then read 11.11 and dog-
ear this section as a checklist for you to review any time you hit a problem in your
code during the next few weeks.

2.2.3 Printing messages from the


Esri geoprocessing framework
When you work with geoprocessing tools in Python, sometimes a script will fail
because something went wrong with the tool. It could be that you wrote flawless
Python syntax, but your script doesn't work as expected because Esri geoprocessing
tools cannot find a dataset or otherwise digest a tool parameter. You won't be able
to catch these errors with the debugger, but you can get a view into them by
printing the messages returned from the Esri geoprocessing framework.

Esri has configured its geoprocessing tools to frequently report what they're doing.
When you run a geoprocessing tool from ArcMap or ArcCatalog, you see a box with
these messages, sometimes accompanied by a progress bar. You learned in Lesson 1
that you can use arcpy.GetMessages() to access these messages from your script. If
you only want to view the messages when something goes wrong, you can include
them in an except block of code, like this.

try:
. . .
except:
print arcpy.GetMessages()
Geoprocessing messages have three levels of severity: Message, Warning, and
Error. You can pass an index to the arcpy.GetMessages() method to filter through
only the messages that reach a certain level of severity. For example,
arcpy.GetMessages(2) would return only the messages with a severity of "Error."

Error and warning messages sometimes include a unique code that you can use to
look up more information about the message. The ArcGIS Desktop Help contains
topics that list the message codes and provide details on each. Some of the entries
have tips for fixing the problem.

Further reading

Please take a look at the official ArcGIS documentation for more detail about
geoprocessing messages. Be sure to read these topics:

 Understanding message types and severity [4]

 Understanding geoprocessing tool errors and warnings [5] - This is the

gateway into the error and warning reference section of the help that

explains all the error codes. Sometimes you'll see these codes in the

messages you get back, and the specific help topic for the code can help you

understand what went wrong. The article also talks about how you can trap

for certain conditions in your own scripts and cause specific error codes to

appear. This type of thing is optional, for over and above credit, in this

course.

2.2.4 Other sources of help


Besides the above approaches, there are many other places you can get help. A few
of them are described below. If you're new to programming, just knowing that
these resources exist and how to use them can help you feel more confident. Find
the ones that you prefer and return to them often. This habit will help you become
a self-sufficient programmer and will improve your potential to learn any new
programming language or technology.

Drawing on the resources below takes time and effort. Many people don't like
combing through computer documentation, and this is understandable. However,
you may ultimately save time if you look up the answer for yourself instead of
waiting for someone to help you. Even better, you will have learned something new
from your own experience, and things you learn this way are much easier to
remember in the future.

Sources of help

 Search engines - Search engines are useful for both quick answers and
obscure problems. Did you forget the syntax for a loop? The quickest remedy
may be to Google "for loop python" or "while loop python" and examine one
of the many code examples returned. Search engines are extremely useful
for diagnosing error messages. Google the error message in quotes and you
can read experiences from others who have had the same issue. If you don't
get enough hits, remove the quotes to broaden the search.

One risk you run from online searches is finding irrelevant information.
Even more dangerous is using irrelevant information. Research any sample
code to make sure it is applicable to the version of Python you're using.
Some syntax in Python 3.x is different from the Python 2.x that you're using
in this course.

 Esri online help - Esri maintains their entire help system online, and

you'll find most of their scripting topics in the sections Geoprocessing with

Python [6] and The ArcPy site package [7].

Another section, which you should visit repeatedly, is the Geoprocessing

Tool Reference [8], which describes every tool in the toolbox and contains

Python scripting examples for each. If you're having trouble understanding

what parameters go in or out of a tool, or if you're getting an error back from

the geoprocessing framework itself, try the Geoprocessing Tool Reference

before you do a random Internet search. You will have to visit the

Geoprocessing Tool Reference in order to be successful in some of the course

projects and quizzes.

 Python online help - The official Python documentation [9] is available

online. Some of it gets very detailed and takes the tone of being written by
programmers for programmers. The part you'll probably find most helpful is

the Python Standard Library reference [10], which is a good place to learn

about Python's modules such as "o,s" "math," or "random."

 Printed books, including your textbook - Programming books can be


very hit or miss. Many books are written for people who have already
programmed in other languages. Others proclaim they're aimed at
beginners, but the writing or design of the book may be unintuitive or
difficult to digest. Before you drop $40 on a book, try to skim through it
yourself to see if the writing generally makes sense to you (don't worry about
not understanding the code--that will come along as you work through the
book).

The course text Python Scripting for ArcGIS is a generally well-written


introduction to just what the title says: working with ArcGIS using Python.
There are a few other books that have started to appear on this subject and I
anticipate there will be more. If you've struggled with the material, or if you
want to do a lot of scripting in the future, I may recommend picking up one
of these. Your textbook can come in handy if you need to look at a very basic
code example, or if you're going to use a certain type of code construct for
the first time and you want to review the basics before you write anything.

A good general Python reference is Learning Python by Mark Lutz. We


previously used this text in Geog 485 before there was a book about scripting
with ArcGIS. It covers beginning to advanced topics, so don't worry if some
parts of it look intimidating.
 Esri forums and other online forums - The Esri forums are a place
where you can pose your question to other Esri software users, or read about
issues other users have encountered that may be similar to yours. There is a
Python Esri forum [11] that relates to scripting with ArcGIS, and also a more
general Geoprocessing Esri forum [12] you might find useful.

Before you post a question on the Esri forums, do a little research to make
sure the question hasn't been answered already, at least recently. I also
suggest that you post the question to our class forums first, since your peers
are working on the same problems and you are more likely to find someone
who's familiar with your situation and has found a solution.

There are many other online forums that address GIS or programming
questions. You'll see them all over the Internet if you perform a Google
search on how to do something in Python. Some of these sites are laden with
annoying banner ads or require logins, while others are more immediately
helpful. Stack Exchange [13] is an example of a well-traveled technical
forum, light on ads, that allows readers to promote or demote answers
depending on their helpfulness. One of its child sites, GIS Stack Exchange
[14], specifically addresses GIS and cartography issues.

If you do post to online forums, be sure to provide detailed information on


the problem and list what you've tried already. Avoid posts such as "Here's
some code that's broken and I don't know why" followed by dozens of lines
of pasted-in code.

In the interest of preserving academic integrity, pasting large sections of


assignment code on any Internet forum may result in a penalty applied to
your grade.

 Class forums - Our course has discussion boards that you may use to
consult your peers about any Python problem that you encounter. I
encourage you to check them often and to participate by both asking and
answering questions. I request that you make your questions focused and
avoid pasting large blocks of code that would rob someone of the benefit of
completing the assignment on their own. Short, focused blocks of code that
solve a specific question are definitely okay. Code blocks that are not copied
directly from your assignment are also okay.

I monitor all discussion boards closely; however sometimes I may not


respond immediately because I want to give you a chance to help each other
and work through problems together. If you post a question and wind up
solving your own problem, please post the solution as a courtesy to other
students who may have the same problem.

 Consulting the instructor - I am available to help you at any point in the


course and my goal is to respond to any personal message or e-mail within
24 hours (notice the obvious problem if you have waited to begin your
assignment until 24 hours before it's due!). I am happy to consult with you
through e-mail, telephone, or whatever technology is necessary to help you
be successful.

I ask that you try some of the many troubleshooting and help resources
above before you contact me. If the issue is with your code and I cannot
immediately see the problem, the resources we will use to find the answer
will be the same that I listed above: the debugger, printing geoprocessing
messages, looking for online code examples, etc. If you feel unsure about
what you're doing, I'm available to talk through these approaches with you.

Lesson 2 Practice Exercises


Before trying to tackle Project 2, you may want to try some simple practice
exercises, particularly if the concepts in this lesson were new to you. Remember to
choose File > New in PythonWin to create a new script (or click the empty page
icon). You can name the scripts something like Practice1, Practice2, etc. To execute
a script in PythonWin, click the "running man" icon.

1. Find the spaces in a list of names

Python String objects have an index method that enables you to find a

substring within the larger string. For example, if I had a variable defined as

name = "Joe Paterno" and followed that up with the expression

name.index("Pa"), it would return the value 4 because the substring "Pa"

begins at character 4 in the string held in name. (The first character in a

string is at position 0.)

For this practice exercise, start by creating a list of names like the following:

beatles = ["John Lennon", "Paul McCartney", "Ringo Starr", "George

Harrison"]

Then write code that will loop through all the items in the list, printing a

message like the following:

"There is a space in ________'s name at character ____."

where the first blank is filled in with the name currently being processed by

the loop and the second blank is filled in with the position of the first space

in the name as returned by the index method. (You should obtain values of

4, 4, 5 and 6, respectively, for the items in the list above.)

This is a good example in which it is smart to write and test versions of the

script that incrementally build toward the desired result rather than trying
to write the final version in one fell swoop. For example, you might start by

setting up a loop and simply printing each name. If you get that to work, give

yourself a little pat on the back and then see if you can simply print the

positions of the space. Once you get that working, then try plugging the

name and space positions into the larger message.

Practice 1 Solution [15]

2. Convert the names to a "Last, First" format

Build on Exercise 1 by printing each name in the list in the following format:

Last, First

To do this, you'll need to find the position of the space just as before. To

extract part of a string, you can specify the start character and the end

character in brackets after the string's name, as in the following:

3. name = "Joe Paterno"

print name[4:11] # prints Paterno

One quirky thing about this syntax is that you need to specify the end

character as 1 beyond the one you really want. The "o" in "Paterno" is really

at position 10, but I needed to specify a value of 11.

One handy feature of the syntax is that you may omit the end character

index if you want everything after the start character. Thus, name[4:] will
return the same string as name[4:11] in this example. Likewise, the start

character may be omitted to obtain everything from the beginning of the

string to the specified end character (-1).

Practice 2 Solution [16]

4. Convert scores to letter grades

Write a script that accepts a score from 1-100 as an input parameter, then

reports the letter grade for that score. Assign letter grades as follows:

A: 90-100

B: 80-89

C: 70-79

D: 60-69

F: <60

Practice 3 Solution [17]

5. Create copies of a template shapefile

Imagine that you're again working with the Nebraska precipitation data

from Lesson 1 and that you want to create copies of the Precip2008Readings

shapefile for the next 4 years after 2008 (e.g., Precip2009Readings,

Precip2010Readings, etc.). Essentially, you want to copy the attribute

schema of the 2008 shapefile, but not the data points themselves. Those will

be added later. The tool for automating this kind of operation is the Create
Feature Class tool in the Data Management toolbox. Look up this tool in the

Help system and examine its syntax and the example script. Note the

optional template parameter, which allows you to specify a feature class

whose attribute schema you want to copy.

To complete this exercise, you should invoke the Create Feature Class tool

inside a loop that will cause the tool to be run once for each desired year.

Note that Esri uses some inconsistent casing with this tool and you will have

to call arcpy.CreateFeatureclass_management() using a lower-case "c" on

"class." If you follow the examples in the Geoprocessing Tool Reference help

you will be fine.

Practice 4 Solution [18]

6. Clip all feature classes in a geodatabase

The data for this practice exercise consists of two file geodatabases: one for

the USA and one for just the state of Iowa. The USA dataset contains

miscellaneous feature classes. The Iowa file geodatabase is empty except for

an Iowa state boundary feature class.

Download the data [19]

Your task is to write a script that programmatically clips all the feature

classes in the USA geodatabase to the Iowa state boundary. The clipped

feature classes should be written to the Iowa geodatabase. Append "Iowa" to

the beginning of all the clipped feature class names.


Your script should be flexible enough that it could handle any number of

feature classes in the USA geodatabase. For example, if there were 15 feature

classes in the USA geodatabase instead of three, your final code should not

need to change in any way.

Practice 5 Solution [20]

Project 2: Batch reprojection tool


for vector datasets
Some GIS departments have determined a single, standard projection in which to
maintain their source data. The raw datasets, however, can be obtained from third
parties in other projections. These datasets then need to be reprojected into the
department's standard projection. Batch reprojection, or the reprojection of many
datasets at once, is a task well suited to scripting.

In this project you'll practice Python fundamentals by writing a script that re-
projects the vector datasets in a folder. From this script, you will then create a
script tool that can easily be shared with others.

The tool you will write should look like the image below. It has two input
parameters and no output parameters. The two input parameters are:

1. A folder on disk containing vector datasets to be re-projected.

2. The path to a vector dataset whose spatial reference will be used in the re-

projection. For example, if you want to re-project into NAD 1983 UTM Zone

10, you would browse to some vector dataset already in NAD 1983 UTM

Zone 10. This could be one of the datasets in the folder you supplied in the

first parameter, or it could exist elsewhere on disk.


Figure 2.1 The Project 2 tool with two input parameters and no output

parameters.

Running the tool causes re-projected datasets to be placed on disk in the target
folder.

Requirements

To receive full credit, your script:

 Must re-project shapefile vector datasets in the folder to match the target

dataset's projection.

 Must append "_projected" to the end of each projected dataset name. For

example: CityBoundaries_projected.shp.

 Must skip projecting any datasets that are already in the target projection.

 Must report a geoprocessing message telling which datasets were projected.

In this message, the dataset names can be separated by spaces. In the

message, do not include datasets that were skipped because they were

already in the target projection. Notice an example of this type of custom

message below in the line "Projected . . . :"


Figure 2.2 Your script must report a geoprocessing message telling which

datasets were projected.

 Must not contain any hard-coded values such as dataset names, path names,

or projection names.

 Must be made available as a script tool that can be easily run from

ArcToolbox by someone with no knowledge of scripting.

Successful completion of the above requirements is sufficient to earn 90% of the


credit on this project. The remaining 10% is reserved for "over and above" efforts
which could include, but are not limited to, the following:

 Your geoprocessing message of projected datasets contains commas between

the dataset names, with no extra "trailing" comma at the end.

 Side panel help documentation is provided for your script tool. This means

that when you open the tool dialog and click Show Help, instructions for
each parameter appear on the side. The ArcGIS Desktop Help can teach you

how to do this.

 Your script exhibits exceptional error handling or other best practices.

 Your script tool uses relative paths to the .py file and is easily deployed

without having to "re-wire" the toolbox to the script. See A structure for

sharing tools [21] for ideas.

You are not required to handle datum transformations in this script. It is assumed
that each dataset in the folder uses the same datum, although the datasets may be
in different projections. Handling transformations would cause you to have to add
an additional parameter in the Project tool and would make your script more
complicated than you would probably like for this assignment.

Sample data

The Lesson 2 data [1] folder contains a set of vector shapefiles for you to work with
when completing this project (delete any subfolders in your Lesson 2 data folder—
you may have one called PracticeData—before beginning this project). These
shapefiles were obtained from the Washington State Department of Transportation
GeoData Distribution Catalog [22], and they represent various geographic features
around Washington state. For the purpose of this project, I have put these datasets
in various projections. These projections share the same datum (NAD 83) so that
you do not have to deal with datum transformations.

The datasets and their original projections are:

 CityBoundaries and StateRoutes -

NAD_1983_StatePlane_Washington_South_FIPS_4602

 CountyLines - NAD_1983_UTM_Zone_10N

 Ferries - USA_Contiguous_Lambert_Conformal_Conic

 PopulatedPlaces - GCS_NorthAmerican_1983

Deliverables

Deliverables for this project are as follows:


 The source .py file containing your script

 The .tbx file containing your script tool

 A short writeup (about 300 words) describing what you learned during this

project and how you approached the problem. You should include which

requirements you met, or failed to meet, including "over and above" efforts.

Tips

The following tips can help improve your possibility of success with this project:

 Do not use the Esri Batch Project tool in this project. In essence, you're

required to make your own variation of a batch project tool in this project by

running the Project tool inside a loop. Your tool will be easier to use because

it's customized to the task at hand.

 There are a lot of ways to insert "_projected" in the name of a dataset, but
you might find it useful to start by temporarily removing ".shp" and adding
it back on later. To remove ".shp" you can use syntax like this:
 rootName = ""

 if fc.endswith(".shp"):

 rootName = fc[:-4]

In the above code, fc is your shapefile name with .shp included, and fc[:-4]
removes the last four characters. If your script were to handle all types of file
extensions you would avoid hard-coding a number like -4, but your script is
not required to handle any extensions other than .shp.

 To check if a dataset is already in the target projection, you will need to

obtain a Spatial Reference object for each dataset (the dataset to be

projected and the target dataset). You will then need to compare the spatial

reference names of these two datasets. Be sure to compare the Name


property of the spatial references; do not compare the spatial reference

objects themselves. This is because you can have two spatial reference

objects that are different entities (and are thus "not equal"), but have the

same name property.

You should end up with a line similar to this: if fcSR.Name !=

targetSR.Name: where fcSR is the spatial reference of the feature class

to be projected and targetSR is the target spatial reference obtained from

the target projection shapefile.

 If you want to show all the messages from each run of the Project tool, add

the line: arcpy.AddMessage(arcpy.GetMessages()) immediately after the line

where you run the Project tool. Each time the loop runs, it will add the

messages from the current run of the Project tool into the results window.

It's been my experience that if you wait to add this line until the end of your

script, you only get the messages from the last run of the tool, so it's

important to put the line inside the loop. Remember that while you are first

writing your script you can use print statements to debug, then switch to

arcpy.AddMessage() when you have verified that your script works and you

are ready to make a script tool.

 If you need extra help with making the script tool, refer back to Lesson 1.7.1

and also read Zandbergen 13.1 - 13.10 where he goes in depth about making

your own tools.

 If, after all your best efforts, you ran out of time and could not meet one of

the requirements, comment out the code that is not working (using a # sign
at the beginning of each line) and send the code anyway. Then explain in

your brief writeup which section is not working and what troubles you

encountered. If your commented code shows that you were heading down

the right track, you may be awarded partial credit.

Author(s) and/or Instructor(s): Sterling Quinn, John A. Dutton e-Education


Institute, College of Earth and Mineral Sciences, The Pennsylvania State
University;
Jim Detwiler, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University;
Frank Hardisty, John A. Dutton e-Education Institute, College of Earth and
Mineral Sciences, The Pennsylvania State University;
James O'Brien, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University

Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program

© 1999-2012 The Pennsylvania State University. Except where otherwise noted,


this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.

Please address questions and comments about this resource to the site editor.

Source URL: https://www.e-education.psu.edu/geog485/node/45

Links:
[1] https://www.e-education.psu.edu/drupal6/files/geog485py/data/Lesson2.zip
[2] http://docs.python.org/tutorial/datastructures.html
[3]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002z/002z00000011000000.
htm
[4]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002z/002z0000000p00000
0.htm
[5]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00r90000009v
000000.htm
[6]
http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=An_overview_
of_writing_geoprocessing_scripts
[7]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v000000v700000
0.htm
[8]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002t/002t0000000z000000.
htm
[9] http://www.python.org/doc/
[10] http://docs.python.org/library
[11] http://forums.arcgis.com/forums/117-Python
[12] http://forums.arcgis.com/forums/31-Geoprocessing
[13] https://www.e-education.psu.edu/geog485/stackexchange.com
[14] http://gis.stackexchange.com/
[15] https://www.e-education.psu.edu/geog485/../L02_Prac1.html
[16] https://www.e-education.psu.edu/geog485/../L02_Prac2.html
[17] https://www.e-education.psu.edu/geog485/../L02_Prac3.html
[18] https://www.e-education.psu.edu/geog485/../L02_Prac4.html
[19] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/Lesson2PracticeExercise.zip
[20] https://www.e-education.psu.edu/geog485/../L02_Prac5.html
[21]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0057/00570000000400000
0.htm
[22] http://www.wsdot.wa.gov/mapsdata/GeoDataCatalog/default.htm
Lesson 3: GIS data access and
manipulation with Python
An essential part of a GIS is the data that represents both the geometry
(coordinates) of geographic features and the attributes of those features. This
combination of features and attributes is what makes GIS go beyond just
"mapping." Much of your work as a GIS analyst involves adding, modifying, and
deleting features and their attributes from the GIS.

Beyond maintaining the data, you also need to know how to query and select the
data that is most important to your projects. Sometimes you'll want to query a
dataset to find only the records that match a certain criteria (for example, single-
family homes constructed before 1980) and calculate some statistics based on only
the selected records (for example, percentage of those homes that experienced
termite infestation).

All of the above tasks of maintaining, querying, and summarizing data can become
tedious and error prone if performed manually. Python scripting is often a faster
and more accurate way to read and write large amounts of data. There are already
many tools for data selection and management in ArcToolbox. Any of these can be
used in a Python script. For more customized scenarios where you want to read
through a table yourself and modify records one-by-one, the Geoprocessor
programming model contains special objects, called cursors, that you can use to
examine each record in a table. You'll quickly see how the looping logic that you
learned in Lesson 2 becomes useful when you are cycling through tables using
cursors.

Using a script to work with your data introduces some other subtle advantages over
manual data entry. For example, in a script you can add checks to ensure that the
data entered conforms to a certain format. You can also chain together multiple
steps of selection logic that would be time-consuming to perform in ArcMap.

This lesson explains ways to read and write GIS data using Python. We'll start off
by looking at how you can create and open datasets within a script. Then we'll
practice reading and writing data using both geoprocessing tools and cursor
objects. Although this is most applicable to vector datasets, we'll also look at some
ways you can manipulate rasters with Python. Once you're familiar with these
concepts, Project 3 will give you a chance to practice what you've learned.

Lesson 3 checklist
Lesson 3 explains how to read and manipulate both vector and raster data with
Python. To complete Lesson 3 you are required to do the following:
1. Work through the course lesson materials.

2. Read Zandbergen chapter 7.1 - 7.3 and all of chapter 9. In the online lesson

pages I have inserted instructions about when it is most appropriate to read

each of these chapters.

3. Complete Project 3 and submit your zipped deliverables to the Project 3

drop box.

4. Complete the Lesson 3 Quiz.

5. Read the Final Project proposal assignment and begin working on your

proposal, which is due after the first week of Lesson 4.

3.1 Data storage and retrieval in


ArcGIS
Before getting into the details of how to read and modify these attributes, it's
helpful to review how geographic datasets are stored in ArcGIS. You need to know
this so you can open datasets in your scripts, and on occasion, create new datasets.

Geodatabases

Over the years, Esri has developed various ways of storing spatial data. They
encourage you to put your data in geodatabases, which are organizational
structures for storing datasets and defining relationships between those datasets.
Different flavors of geodatabase are offered for storing different magnitutes of data.

 Personal geodatabases are a small, nearly deprecated form of geodatabase

that store data on the local file system. The data is held in a Microsoft Access

database, which limits how much data can be stored in the geodatabase.

 File geodatabases are a newer way of storing data on the local file system.

The data is stored in a proprietary format developed by Esri. A file


geodatabase can hold more data than a personal geodatabase: up to

terabytes.

 ArcSDE geodatabases or "enterprise geodatabases" store data on a central

server in a relational database management system (RDBMS) such as SQL

Server, Oracle, or PostgreSQL. These are large databases designed for

serving data not just to one computer, but to an entire enterprise. Since

working with an RDBMS can be a job in itself, Esri has develped ArcSDE as

"middleware" that allows you to configure and read your datasets in

ArcCatalog or ArcMap without touching the RDBMS software.

For actions where ArcSDE is required but where it would be too heavy-

handed to purchase and configure an enterprise RDBMS, Esri has developed

a smaller "workgroup" version of ArcSDE that works with the free database

SQL Server Express. This can be configured directly from ArcCatalog or the

Catalog window in ArcMap.

In recent years, Esri has also promoted a new feature called query layers [1],

which allow you to pull data directly out of an RDBMS using SQL queries,

with no ArcSDE involved.

A single vector dataset within a geodatabase is called a feature class. Feature


classes can be optionally organized in feature datasets. Raster datasets can also be
stored in geodatabases.

Standalone datasets

Although geodatabases are essential for long-term data storage and organization,
it's sometimes convenient to access datasets in a "standalone" format on the local
file system. Esri's shapefile is probably the most ubiquitous standalone vector data
format (it even has its own Wikipedia article [2]). A shapefile actually consists of
several files that work together to store vector geometries and attributes. The files
all have the same root name, but use different extensions. You can zip the
participating files together and easily e-mail them or post them in a folder for
download. In the Esri file browsers in ArcCatalog or ArcMap, the shapefiles just
appear as one file.

Note: Sometimes in ESRI documentation shapefiles are also referred to as "feature


classes." When you see the term "feature class," consider it to mean a vector dataset
that can be used in ArcGIS.

Another type of standalone dataset dating back to the early days of ArcGIS is the
ArcInfo coverage. Like the shapefile, the coverage consists of several files that work
together. Coverages are becoming more and more rare, but you might encounter
them if your organization has used (or still uses!) ArcInfo Workstation.

Raster datasets are also often stored in standalone format instead of being loaded
into a geodatabase. A raster dataset can be a single file, such as a JPEG or a TIFF,
or, like a shapefile, it can consist of multiple files that work together.

Providing paths in Python scripts

Often in a script you'll need to provide the path to a dataset. Knowing the syntax for
specifying the path is sometimes a challenge because of the many different ways of
storing data listed above. For example, below is an example of what a file
geodatabase looks like if you just browse the file system of Windows Explorer. How
do you specify the path to the dataset you need? This same challenge could occur
with a shapefile, which, although more intuitively named, actually has three or
more participating files.
Figure 3.1 A file database as viewed via the file system of Windows Explorer.

The safest way to get the paths you need is to browse to the dataset in ArcCatalog
and take the path that appears in the Location toolbar. Here's what the same file
geodatabase would look like in ArcCatalog. The circled path shows how you would
refer to a feature class within the geodatabase.

Figure 3.2 The same file geodatabase, shown in ArcCatalog.

Below is an example of how you could access the feature class in a Python script
using this path. This is similar to one of the examples in Lesson 1.
import arcpy
featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"
desc = arcpy.Describe(featureClass)
spatialRef = desc.SpatialReference
print spatialRef.Name

Remember that the backslash (\) is a reserved character in Python, so you'll need to
use either the double backslash (\\) or forward slash (/) in the path. Another
technique you can use for paths is the raw string, which allows you to put
backslashes and other reserved characters in your string as long as you put "r"
before your quotation marks.

featureClass = r"C:\Data\USA\USA.gdb\Cities"
. . .

Workspaces

The Esri geoprocessing framework often uses the notion of a workspace to denote
the folder or geodatabase where you're currently working. When you specify a
workspace in your script, you don't have to list the full path to every dataset. When
you run a tool, the geoprocessor sees the feature class name and assumes that it
resides in the workspace you specified.

Workspaces are especially useful for batch processing, when you perform the same
action on many datasets in the workspace. For example, you may want to clip all
the feature classes in a folder to the boundary of your county. The workflow for this
is:

1. Define a workspace.

2. Create a list of feature classes in the workspace.

3. Define a clip feature.

4. Configure a loop to run on each feature class in the list.

5. Inside the loop, run the Clip tool.

Here's some code that clips each feature class in a file geodatabase to the Alabama
state boundary, then places the output in a different file geodatabase. Note how the
five lines of code after import arcpy correspond to the five steps listed above.

import arcpy

arcpy.env.workspace = "C:\\Data\\USA\\USA.gdb"
featureClassList = arcpy.ListFeatureClasses()
clipFeature = "C:\\Data\\Alabama\\Alabama.gdb\\StateBoundary"
for featureClass in featureClassList:
arcpy.Clip_analysis(featureClass, clipFeature,
"C:\\Data\\Alabama\\Alabama.gdb\\" + featureClass)

In the above example, the method arcpy.ListFeatureClasses() was the key to


making the list. This method looks through a workspace and makes a Python list of
each feature class in that workspace. Once you have this list, you can easily
configure a for loop to act on each item.

Notice that you designated the path to the workspace using the location of the file
geodatabase "C:\\Data\\USA\\USA.gdb". If you were working with shapefiles, you
would just use the path to the containing folder as the workspace.

If you were working with ArcSDE, you would use the path to the .sde connection
file when creating your workspace. This is a file that is created when you connect to
ArcSDE in ArcCatalog, and is placed in your local profile directory. We won't be
accessing ArcSDE data in this course, but if you do this at work, remember that you
can use the Location toolbar in ArcCatalog to help you understand the paths to
datasets in ArcSDE.

3.2 Reading vector attribute data


Now that you know how to open a dataset, let's go little bit deeper and start
examining some individual data records. This section of the lesson discusses how to
read and search data tables. These tables often provide the attributes for vector
features, but they can also stand alone in some cases. The next session will cover
how to write data to tables. At the end of the lesson, we'll look at rasters.

As we work with the data, it will be helpful for you to follow along, cutting and
pasting the example code into practice scripts. Throughout the lesson you'll
encounter exercises that you can do to practice what you just learned. You're not
required to turn in these exercises, but if you complete them you will have a greater
familiarity with the code that will be helpful when you begin working on this week's
project. It's impossible to read a book or a lesson, then sit down and write perfect
code. Much of what you learn comes through trial and error and learning from
mistakes. Thus, it's wise to write code often as you complete the lesson.

3.2.1 Accessing data fields


Before we get too deep into vector data access, it's going to be helpful to quickly
review how the vector data is stored in the software. Vector features in ArcGIS
feature classes (using this term to include shapefiles) are stored in a table. The
table has rows (records) and columns (fields).

Fields in the table


Fields in the table store the geometry and attribute information for the features.

There are two fields in the table that you cannot delete. One of the fields (usually
called SHAPE) contains the geometry information for the features. This includes
the coordinates of each vertex in the feature and allows the feature to be drawn on
the screen. The geometry is stored in binary format; if you were to see it printed on
the screen, it wouldn't make any sense to you. However, you can read and work
with geometries using objects that are provided with arcpy.

The other field included in every feature class is an object ID field (OBJECTID or
FID). This contains a unique number, or identifier for each record that is used by
ArcGIS to keep track of features. The object ID helps avoid confusion when
working with data. Sometimes records have the same attributes. For example, both
Los Angeles and San Francisco could have a STATE attribute of 'California,' or a
USA cities dataset could contain multiple cities with the NAME attribute of
'Portland;' however, the OBJECTID field can never have the same value for two
records.

The rest of the fields contain attribute information that describe the feature. These
attributes are usually stored as numbers or text.

Discovering field names

When you write a script, you'll need to provide the names of the particular fields
you want to read and write. You can get a Python list of field names using
arcpy.ListFields().

# Reads the fields in a feature class

import arcpy

featureClass = "C:\\Data\\Alabama\\Alabama.gdb\\Cities"
fieldList = arcpy.ListFields(featureClass)
# Loop through each field in the list and print the name
for field in fieldList:
print field.name

The above would yield a list of the fields in the Cities feature class in a file
geodatabase named Alabama. If you ran this script in PythonWin (try it with one of
your own feature classes!) you would see something like the following in the
Interactive Window.

>>> OBJECTID
Shape
UIDENT
POPCLASS
NAME
CAPITAL
STATEABB
COUNTRY
Notice the two special fields we already talked about: OBJECTID, which holds the
unique identifying number for each record, and Shape, which holds the geometry
for the record. Additionally, this feature class has fields that hold the name
(NAME), the state (STATEABB), whether or not the city is a capital (CAPITAL),
and so on.

Arcpy treats the field as an object, therefore the field has properties that describe it.
That's why you can print field.name. The help reference topic Using fields and
indexes [3] lists all the properties that you can read from a field. These include
AliasName, Length, Type, Scale, Precision, and others.

Properties of a field are read-only, meaning that you can find out what the field
properties are, but you cannot change those properties in a script using the Field
object. If you wanted to change the scale and precision of a field, for instance, you
would have to programmatically add a new field.

3.2.2 Reading through records


Now that you know how to traverse the table horizontally, reading the fields that
are available, let's examine how to read up and down through the table records.

The search cursor

The arcpy module contains some objects called cursors that allow you to move
through records in a table. Cursors are not unique to ArcGIS scripting; in fact, if
you've worked in ArcObjects before, this concept of a cursor is probably familiar to
you. The first cursor we'll look at is the search cursor, since it's designed for simple
reading of data. The common workflow is:

1. Create the search cursor. This is done through the method

arcpy.SearchCursor(). This method takes several parameters in which you

specify which dataset and, optionally, which specific rows you want to read.

2. Call SearchCursor.next() to read the first row.

3. Start a loop that will exit when there are no more rows available to read.

4. Do something with the values in the current row.

5. Call SearchCursor.next() to move on to the next row. Because you created a

loop, this puts you back at the previous step if there is another row available
to be read. If there are no more rows, the loop condition is not met and the

loop terminates.

When you first try to understand cursors, it may help to visualize the attribute table
with an arrow pointing at the "current row." When the cursor is first created, that
arrow is pointing just above the first row in the table. The first time the next()
method is called, the arrow moves down to the first row (and returns a reference to
that row). Each time next() is called, the arrow moves down one row. If next() is
called when the arrow is pointing at the last row, a special data type called None is
returned.

Here's a very simple example of a search cursor that reads through a point dataset
of cities and prints the name of each.

# Prints the name of each city in a feature class

import arcpy

featureClass = "C:\\Data\\Alabama\\Alabama.gdb\\Cities"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

while row:
print row.NAME
row = rows.next()

The last five lines of the above script correspond to the five steps in the above
workflow. Cursors can be tricky to understand at first, so let's look at those lines
more closely. Below are the five lines again with comments so you can see exactly
what's happening:

# Create the search cursor


rows = arcpy.SearchCursor(featureClass)

# Call SearchCursor.next() to read the first row


row = rows.next()

# Start a loop that will exit when there are no more rows available
while row:

# Do something with the values in the current row


print row.NAME

# Call SearchCursor.next() to move on to the next row


row = rows.next()

Notice a few other important things before moving on:


 The loop condition "while row:" is a simple Boolean way of specifying

whether the loop should continue. If a row object exists, the statement

evaluates to true and the loop continues. If a row object doesn't exist, the

statement evaluates to false and the loop terminates.

 You can read a field value as a property of a row. For example, row.NAME

gave you the value in the NAME field. If your table had a POPULATION

field, you could use row.POPULATION to get the population.

 The names "rows" and "row" are just variable names that represent the

SearchCursor and Row objects, respectively. We could name these anything.

The Esri examples tend to name them rows and row, and we'll do the same.

However, if you needed to use two search cursors at the same time, you'd

have to come up with some additional names.

Here's another example where something more complex is done with the row
values. This script finds the average population for counties in a dataset. To find
the average, you need to divide the total population by the number of counties. The
code below loops through each record and keeps a running total of the population
and the number of records counted. Once all the records have been read, only one
line of division is necessary to find the average.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Iowa\\Counties.shp"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population


# and records counted.

while row:
totalPopulation += row.POP2008
recordsCounted += 1
row = rows.next()

average = totalPopulation / recordsCounted


print "Average population for a county is " + str(average)

Although the above script is longer than the first one, it's still following the general
pattern of creating a search cursor, advancing to the first row, doing something
with the row, and repeating the process until there are no records left.

Reading values when the field name is a variable

In the previous script, the population of a record was referenced as row.POP2008


where the population field name is POP2008. This is a pretty easy way to get a field
value, but what happens if you get data for 2009 in a field named POP2009 and
you want to run the script again? What if you have many, long scripts that always
reference the population field this way? You would have to carefully search each
script for row.POP2008 and replace it with row.POP2009. This could be tedious
and error-prone.

You can make your scripts more versatile by using variables to represent field
names. You could declare a variable, such as populationField to reference the
population field name, whether it were POP2008, POP2009, or simply
POPULATION. The Python interpreter isn't going to recognize
row.populationField, so you need to use Row.getValue() instead and pass in the
variable as a parameter.

The script below uses a variable name to get the population for each record. Lines
changed from the script above are in bold. Notice how a variable named
populationField is created and the method call
row.getValue(populationField) that retrieves the population of each
record.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Iowa\\Counties.shp"
populationField = "POP2008"

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population


# and records counted.

while row:
totalPopulation += row.getValue(populationField)
recordsCounted += 1
row = rows.next()

average = totalPopulation / recordsCounted


print "Average population for a county is " + str(average)

To update the above script, you would just have to set populationField =
"POP2009" near the top of the script. This is certainly easier than searching
through the body of the script for row.POP2008; however, you can go one step
further and allow the user to enter any field name that he or she wants as an
argument when running the script.

Remember in Lesson 1 how you learned that arcpy.GetParameterAsText() allows


the user of the script to supply a value for the variable? Using that technique for
both the feature class path and the population field name makes your script very
flexible. Notice that the code below contains no hard-coded path names, field
names, or numbers besides 0 and 1. This means you could run the script with any
feature class containing any name for its population field without modifying the
code. In fact, you could use code similar to this to find the average of any numeric
field, such as square mileage, or number of homeowners.

# Finds the average population in a counties dataset

import arcpy

featureClass = arcpy.GetParameterAsText(0)
populationField = arcpy.GetParameterAsText(1)

rows = arcpy.SearchCursor(featureClass)
row = rows.next()

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population


# and records counted.

while row:
totalPopulation += row.getValue(populationField)
recordsCounted += 1
row = rows.next()

average = totalPopulation / recordsCounted


print "Average population for a county is " + str(average)

Here's how you could run the above script in PythonWin by supplying the path
name and population field as the arguments.
Figure 3.3 Running the above script in PythonWin.

Using a for loop with a cursor

Although the above examples use a while loop in conjunction with the next()
method to advance the cursor, it's often easier to iterate through each record using
a for loop. This became possible with ArcGIS 10. Here's how the above sample
could be modified to use a for loop. Notice the syntax for row in rows.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Iowa\\Counties.shp"
populationField = "POP2008"

rows = arcpy.SearchCursor(featureClass)

average = 0
totalPopulation = 0
recordsCounted = 0

# Loop through each row and keep running total of population


# and records counted.

for row in rows:


totalPopulation += row.getValue(populationField)
recordsCounted += 1

average = totalPopulation / recordsCounted


print "Average population for a county is " + str(average)

In this example, the next() method is not even required because it is implied by the
for loop that the script will iterate through every record. The object named row is
declared when the for loop is declared.

This syntax is more compact than using a while loop, and you are welcome to
experiment with it and use it in your projects.

The Geog 485 lesson examples have not all been converted to use this technique.
There is some benefit to seeing how the next() method works, especially if you ever
work with ArcGIS 9.3.x Python scripts or if you use cursors in ArcObjects (which
has conceptually similar methods for advancing a cursor row by row). However,
once you get accustomed to using a for loop to traverse a table, it's unlikely you'll
want to go back to using while loops.

The arcpy data access module (ArcGIS 10.1)

If you're using ArcGIS 10.1, you can use the above code for search cursors, or you
can use a new data access module that was introduced into arcpy. These data access
functions are prefixed with arcpy.da and give you faster performance along with
more robust behavior when crashes or errors are encountered with the cursor.

The data access module arcpy.da allows you to create cursor objects, just like arcpy,
but you create them a little differently. Take a close look at the following example
code, which repeats the scenario above to calculate the average population of a
county.

# Finds the average population in a counties dataset

import arcpy

featureClass = "C:\\Data\\Iowa\\Counties.shp"
populationField = "POP2008"

average = 0
totalPopulation = 0
recordsCounted = 0

with arcpy.da.SearchCursor(featureClass, (populationField,)) as cursor:


for row in cursor:
totalPopulation += row[0]
recordsCounted += 1

average = totalPopulation / recordsCounted


print "Average population for a county is " + str(average)

This example uses the same basic structure as the previous one, with a few
important changes. One thing you probably noticed is that the cursor is created
using a "with" statement. Although the explanation of "with" is somewhat
technical, the key thing to understand is that it allows your cursor to exit the
dataset gracefully, whether it crashes or completes its work successfully. This is a
big issue with cursors, which can sometimes maintain locks on data if they are not
exited properly.

The "with" statement requires that you indent all the code beneath it. After you
create the cursor in your "with" statement, you'll initiate a for loop to run through
all the rows in the table. This requires additional indentation.

Notice that this "with" statement creates a SearchCursor object, and declares that it
will be named "cursor" in any subsequent code. The search cursors you create with
arcpy.da have some different initialization parameters from the search cursors you
create with arcpy. The biggest difference is that when you create a cursor with
arcpy.da, you have to supply a tuple of field names that will be returned by the
cursor. Remember that a tuple is a Python data structure much like a list, except it
is enclosed in parentheses and its contents cannot be modified.

Supplying this tuple speeds up the work of the cursor because it does not have to
deal with the potentially dozens of fields included in your dataset. In the example
above, the tuple contains just one field, populationField. A tuple with just one item
in it contains a comma after the item, therefore our tuple above looks like this:
(populationField,). If the tuple were to have multiple items in it, we might see
something like: (populationField, nameField).

Notice that with arcpy.da, you use row objects like with arcpy; however, you do not
use the getValue method to retrieve values out of the rows. Instead, you use the
index position of the field name in the tuple you submitted when you created the
object. Since the above example submits only one item in the tuple, then the index
position of "populationField" within that tuple is 0 (remember that we start
counting from 0 in Python). Therefore, you can use row[0] to get the value of
populationField for a particular row.

If you don't understand all this right away, keep an eye out for the other arcpy.da
examples that I've included throughout Lesson 3. Once you see a few different
examples, you'll start to understand the pattern, and then you can experiment with
it in your own code. If you're using ArcGIS 10.1, it's worth it to learn the arcpy.da
functions. Your code will be faster, more compact, and more robust.

3.2.3 Retrieving records using an


attribute query
The previous examples used the SearchCursor object to read through each record
in a dataset. You can get more specific with the search cursor by instructing it to
retrieve just the subset of records whose attributes comply with some criteria, for
example, "only records with a population greater than 10000,, or "all records
beginning with the letters P – Z."

For review, this is how you construct a search cursor to operate on every record in a
dataset:

rows = arcpy.SearchCursor(featureClass)

If you want the search cursor to retrieve only a subset of the records based on some
criteria, you can supply a SQL expression as the second argument in the
constructor (the constructor is the method that creates the SearchCursor). For
example:

rows = arcpy.SearchCursor(featureClass, '"POP2008" > 100000')


The above example uses the SQL expression "POP2008" > 100000 to
retrieve only the records whose population is greater than 100000. SQL stands for
"Structured Query Language" and is a special syntax used for querying datasets. If
you are not already intimately familiar with SQL, please take a few minutes right
now to read Building a query expression [4] in the ArcGIS Desktop Help. This topic
is a simple introduction to SQL in the context of ArcGIS.

SQL expressions can contain a combination of criteria allowing you to pinpoint a


very focused subset of records. The complexity of your query is limited only by your
available data. For example, you could use a SQL expression to find only states with
a population density over 100 people per square mile that begin with the letter M
and were settled after 1850.

Note that the SQL expression you supply for a search cursor is for attribute queries,
not spatial queries. You could not use a SQL expression to select records that fall
"west of the Mississippi River," or "inside the boundary of Canada" unless you had
previously added and populated some attribute stating whether that condition were
true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson
we'll talk about how to make spatial queries using the Select By Location
geoprocessing tool.

Once you retrieve the subset of records, you can follow the same pattern of iterating
through them using SearchCursor.next() within a loop.

A note about 10.1

If you're using the arcpy.da data access functions in ArcGIS 10.1, you pass in the
SQL expression as the third parameter when you create the search cursor, like this:

with arcpy.da.SearchCursor(featureClass, ("POP2008",), '"POP2008" >


100000') as cursor:
for row in cursor:
print str(row[0])

The problem with quotes

When you include a SQL expression in your SearchCursor constructor, you must
supply it as a string. This is where things can get tricky with quotation marks. SQL
requires single and double quotes in specific places, but you also need to enclose
the entire expression with quotes because it is a string. How do you keep from
getting confused?

You may have noticed that in the above example the SQL expression is enclosed in
single quotes, not double quotes: '"POP2008" > 100000'. In Python you
can use either single quotes or double quotes to enclose a string. Because I knew
the double quotes were required to surround the field name in the SQL statement, I
used single quotes to surround the entire string. This is not just to keep things easy
to read; the Python interpreter does not understand two double quotes in a row.
Therefore, it was not an option to use ""POP2008" > 100000".

The situation gets a bit more difficult when your SQL expression has to use both
single and double quotes, for instance, when you query for a string variable.
Suppose your script allows the user to enter the ID of a parcel and you need to find
it with a search cursor. Some of the parcel IDs include letters and others don't,
therefore you need to always treat the parcel ID as a string. Your SQL expression
would probably look like this: "PARCEL" = 'A2003KSW'.

Because your expression starts with double quotes and ends with single quotes,
which style of quotes do you use to enclose the entire expression? In this case, you
cannot simply enclose the entire expression in one style of quotes; you need to
break up the expression into separate strings. Take a close look at this example:

ID = arcpy.GetParameterAsText(0)
whereClause = '"Parcel"' + " = '" + str(ID) + "'"
rows = arcpy.SearchCursor(featureClass, whereClause)

In the code above, the whereClause, or the SQL expression, is created in


manageable chunks that don't mix single and double quotes. If the piece of the
expression contains double quotes, such as "Parcel" , it is enclosed in single
quotes. If the piece of the expression contains single quotes, such as =' or just ', it
is enclosed in double quotes. This type of situation is where it may be helpful to
temporarily include a print statement in your code or use the debugging tools to
make sure your whereClause is constructed correctly.

Field delimiters

In the examples above, field names are surrounded with double quotes (for
example, "STATE_NAME"). This is correct syntax for shapefiles and file
geodatabases, which are the only data types we'll use in this course. If you use
personal geodatabases in your daily work, there are different ways to delimit the
field name. If you're interested in the correct syntax for different data types, or
ways to make your script flexible for any data type, take a look at the topic SQL
reference for query expressions used in ArcGIS [5] in the ArcGIS Desktop Help.

3.2.4 Retrieving records using a


spatial query
Applying a SQL expression to the search cursor is only useful for attribute queries,
not spatial queries. For example, you can easily open a search cursor on all counties
named "Lincoln" using a SQL expression, but finding all counties that touch or
include the Mississippi River requires a different approach. To get a subset of
records based on a spatial criteria, you need to use the geoprocessing tool Select
Layer By Location.

Note: A few relational databases such as SQL Server 2008 expose spatial data
types that can be spatially queried with SQL. Support for these spatial types in
ArcGIS is still maturing, and in this course we will assume that way to make a
spatial query is through Select Layer By Location. Since we are not using ArcSDE,
this is actually true.

Here's where you need to know a little bit about how ArcGIS works with layers and
selections. Suppose you want to select all states whose boundaries touch Wyoming.
In most cases you won't need to create an entirely new feature class to hold those
particular states; you probably only need to maintain those particular state records
in the computer's memory for a short time while you update some attribute.
ArcGIS uses the concept of feature layers to represent in-memory sets of records
from a feature class.

The Make Feature Layer tool creates a feature layer from some or all of the records
in a feature class. You can apply a SQL expression when you run Make Feature
Layer to narrow down the records included in the feature layer based on attributes.
You can subsequently use Select Layer By Location to narrow down the records in
the feature layer based on some spatial criteria.

Opening a search cursor on Wyoming and all states bordering it would take four
steps:

1. Use Make Feature Layer to make a feature layer of all US States. Let's call

this the All States layer.

2. Use Make Feature Layer to create a second feature layer of just Wyoming.

(To get Wyoming alone, you would apply an SQL expression when making

the feature layer.) Let's call this the Selection State layer.

3. Use Select Layer By Location to narrow down the All States layer (the layer

you created in Step 1) to just those states that touch the Selection State layer.

4. Open a search cursor on the All States layer. The cursor will include only

Wyoming and the states that touch it because there is a selection applied to

the All States layer. Remember that the feature layer is just a set of records
held in memory. Even if you called it the All States layer, it no longer

includes all states once you apply a selection.

Below is some code that applies the above steps.

# Selects all states whose boundaries touch


# a user-supplied state

import arcpy

# Get the US States layer, state, and state name field


usaLayer = "C:\Data\USA\USA.gdb\StateBoundaries"
state = "Wyoming"
nameField = "NAME"

try:
# Make a feature layer with all the US States
arcpy.MakeFeatureLayer_management(usaLayer, "AllStatesLayer")

# Make a feature layer containing only the state of interest


arcpy.MakeFeatureLayer_management(usaLayer,
"SelectionStateLayer",
'"' + str(nameField) + '" =' + "'" + str(state) +
"'")

# Apply a selection to the US States layer

arcpy.SelectLayerByLocation_management("AllStatesLayer","BOUNDARY_TOUCHES
","SelectionStateLayer")

# Open a search cursor on the US States layer


rows = arcpy.SearchCursor("AllStatesLayer")
row = rows.next()

# Print the name of all the states in the selection


while row:
print row.getValue(nameField)
row = rows.next()

# Clean up cursor and row objects


del row
del rows

except:
print arcpy.GetMessages()

# Clean up feature layers


arcpy.Delete_management("AllStatesLayer")
arcpy.Delete_management("SelectionStateLayer")

You can choose from many spatial operators when running SelectLayerByLocation.
The code above uses "BOUNDARY_TOUCHES". Other available relationships are
"INTERSECT", "WITHIN A DISTANCE" (may save you a buffering step),
"CONTAINS", "CONTAINED_BY", and others.

Once you open the search cursor on your selected records, you can perform
whatever action you want on them. The code above just prints the state name, but
more likely you'll want to summarize or update attribute values. You'll learn how to
write attribute values later in this lesson.

Cleaning up feature layers and cursors

Notice the above code example deletes the feature layers using the Delete tool and
the cursors using the del command.

Feature layers and cursors can maintain locks on your data, preventing other
applications from using the data until your script is done. Arcpy is supposed to
clean up cursors and feature layers at the end of the script, but it's a good idea to
delete them yourself in case this doesn't happen or in case there is a crash. In the
case above, the except block will catch a crash, then the script will continue and run
the Delete tool.

The above example using arcpy.da in ArcGIS 10.1

If you're using the arcpy data access module (arcpy.da), the above example could be
written as follows:

# Selects all states whose boundaries touch


# a user-supplied state

import arcpy

# Get the US States layer, state, and state name field


usaLayer = "D:\Data\USA\USA.gdb\StateBoundaries"
state = "Wyoming"
nameField = "NAME"

try:
# Make a feature layer with all the US States
arcpy.MakeFeatureLayer_management(usaLayer, "AllStatesLayer")

# Make a feature layer containing only the state of interest


arcpy.MakeFeatureLayer_management(usaLayer,
"SelectionStateLayer",
'"' + str(nameField) + '" =' + "'" + str(state) +
"'")

# Apply a selection to the US States layer

arcpy.SelectLayerByLocation_management("AllStatesLayer","BOUNDARY_TOUCHES
","SelectionStateLayer")

# Open a search cursor on the US States layer


with arcpy.da.SearchCursor("AllStatesLayer", (nameField,)) as cursor:
for row in cursor:
# Print the name of all the states in the selection
print row[0]

except:
print arcpy.GetMessages()

# Clean up feature layers


arcpy.Delete_management("AllStatesLayer")
arcpy.Delete_management("SelectionStateLayer")

You might have noticed that this sample is a little more brief. You don't have to
delete the cursor because the "with" statement cleans it up for you.There is no use
of getValue; rather, the Row object "row" returns only one field ("NAME"), which is
accessed using its index position in the list of fields. Since there's only one field,
that index is 0, and the syntax looks like this: row[0]

To keep things short, I've written the example using a "for" loop. Remember that
you could potentially use a for loop in 10.0.

Required reading

Before you move on, examine the following tool reference pages. You can ignore the
Command Line Syntax section, but pay particular attention to the Usage Tips and
the Script Examples.

 Make Feature Layer [6]

 Select Layer By Location [7]

3.3 Writing vector attribute data


In the same way that you use cursors to read vector attribute data, you use cursors
to write data as well. Two types of cursors are supplied for writing data:

 Update cursor - This cursor edits values in existing records or deletes

records

 Insert cursor - This cursor inserts new records

In the following sections you'll learn about both of these cursors and get some tips
for using them.

Required reading
The ArcGIS Desktop Help has some explanation of cursors. Get familiar with this
help now, as it will prepare you for the next sections of the lesson. You'll also find it
helpful to return to the code examples while working on Project 3:

Accessing data using cursors [8]

Also follow the three links in the table at the beginning of the above topic. These
briefly explain the InsertCursor [9], SearchCursor [10], and UpdateCursor [11] and
provide a code example for each. You've already worked with SearchCursor, but
closely examine the code examples for all three cursor types and see if you can
determine what is happening in each.

3.3.1 Updating existing records


Use the update cursor to modify existing records in a dataset. Here are the general
steps for using the update cursor:

1. Create the update cursor by calling arcpy.UpdateCursor(). You can

optionally pass in an SQL expression as an argument to this method. This is

a good way to narrow down the rows you want to edit if you are not

interested in modifying every row in the table.

2. Advance the cursor to the first row by calling UpdateCursor.next().

3. Modify the field values in the row that need updating (see tips below).

4. Call UpdateCursor.updateRow() to finalize the edit.

5. Advance the cursor to the next row.

Modifying field values

When you create an UpdateCursor and advanced it to a row, you can then modify
field values. There are two ways you can modify a value:

 Use the syntax Row.< the field name> = <the new value>. For example:

row.OWNER = "Trisha Stevens".


 Use Row.setValue(<field name variable>, <new value>). For example:

row.setValue(ownerField, "Trisha Stevens").

Using the second way, Row.setValue(), is especially useful if the field name is a
variable, and is a good way to avoid hard-coding field names into your script.
Row.setValue() is similar to Row.getValue() that you use with search cursors, but
it's important to remember that with Row.setValue(), you have to supply two
arguments: the field to update, and the new value for that field.

Example

The script below performs a "search and replace" operation on an attribute table.
For example, suppose you have a dataset representing local businesses, including
banks. One of the banks was recently bought out by another bank. You need to find
every instance of the old bank name and replace it with the new name. This script
could perform that task automatically.

#Simple search and replace script


import arcpy

# Retrieve input parameters: the feature class, the field affected by


# the search and replace, the search term, and the replace term.
fc = arcpy.GetParameterAsText(0)
affectedField = arcpy.GetParameterAsText(1)
oldValue = arcpy.GetParameterAsText(2)
newValue = arcpy.GetParameterAsText(3)

# Create the SQL expression for the update cursor. Here this is
# done on a separate line for readability.
queryString = '"' + affectedField + '" = ' + "'" + oldValue + "'"

# Create the update cursor and advance the cursor to the first row
rows = arcpy.UpdateCursor(fc, queryString)
row = rows.next()

# Perform the update and move to the next row as long as there are
# rows left
while row:
row.setValue(affectedField, newValue)
rows.updateRow(row)
row = rows.next()

# Delete the cursors to remove any data locks


del row, rows

Notice that this script is relatively flexible because it gets all the parameters as text
and uses Row.setValue() instead of hard-coding the field name. However, this
script can only be run on string variables because of the way the query string is set
up. Notice that the old value is put in quotes, like this: "'" + oldValue + "'".
Handling other types of variables, such as integers, would have made the example
longer.

Dataset locking

ArcGIS sometimes places locks on datasets to avoid the possibility of editing


conflicts between two users. When you use insert and update cursors, a lock can be
placed on the dataset. Symptoms of a lock can include not being able to view the
attribute table for a dataset in ArcCatalog, and other errors.

You can remove the possibility of locks affecting your work by deleting your cursors
where you are done using them. Use the built-in del function to do this. You can
even delete multiple objects on the same line. Notice that our find and replace
example also deletes the row just to be safe:

# Delete the cursor to remove any data locks


del row, rows

If you forget to delete your cursor, the script may maintain a lock on your data even
when the script has finished executing. If you think that a lock from your script is
affecting your dataset (by preventing you from viewing it, making it look like all
rows have been deleted, and so on), you must close PythonWin to remove the lock.
If you think that ArcGIS has a lock on your data, check to see if ArcMap or
ArcCatalog are using the data in any way. This could possibly occur through having
an open edit session on the data, having the data open in the Preview tab in
ArcCatalog, or having the layer in the table of contents in an open map document
(MXD).

For the Esri explanation of how locking works, you can review the section "Cursors
and locking" in the topic Accessing data using cursors [8] in the ArcGIS Desktop
Help.

Updating records using arcpy.da in ArcGIS 10.1

When you use the arcpy data access module to update records, you do not use the
setValue method. Instead, you just use an = sign to set the value in the row object.
Take a look at how the above "search and replace" example would look using
arcpy.da:

#Simple search and replace script


import arcpy

# Retrieve input parameters: the feature class, the field affected by


# the search and replace, the search term, and the replace term.
fc = arcpy.GetParameterAsText(0)
affectedField = arcpy.GetParameterAsText(1)
oldValue = arcpy.GetParameterAsText(2)
newValue = arcpy.GetParameterAsText(3)
# Create the SQL expression for the update cursor. Here this is
# done on a separate line for readability.
queryString = '"' + affectedField + '" = ' + "'" + oldValue + "'"

# Create the update cursor and update each row returned by the SQL
expression
with arcpy.da.UpdateCursor(fc, (affectedField,), queryString) as cursor:
for row in cursor:
row[0] = newValue
cursor.updateRow(row)

Here it's critical to understand the tuple of affected fields that you pass in when you
create the update cursor. In this example, there is only one affected field (which we
named affectedField), so its index position is 0 in the tuple. Therefore, you set that
field value using row[0] = newValue. Cursor cleanup is not required at the end of
the script because this is accomplished through the "with" statement.

3.3.2 Inserting new records


When adding a new record to a table, you must use the insert cursor. Here's the
workflow for insert cursors:

 Create the insert cursor.

 Create a new row by calling InsertCursor.newRow().

 Set attributes on the row. This can include assigning its geometry, or shape.

 Call InsertCursor.insertRow() to commit the row to the dataset.

As with the update cursor, you can avoid data locking problems by deleting the
insert cursor when you've finished using it.

Insert cursors differ from search and update cursors in that you cannot provide an
SQL expression when you create the insert cursor. This makes sense because an
insert cursor is only concerned with adding records to the table. It does not need to
"know" about the existing records or any subset thereof.

Example

The example below uses an insert cursor to create one new point in the dataset and
assign it one attribute: a string description. This script could potentially be used
behind a public-facing 311 [12] application, in which members of the public can
click a point on a Web map and type a description of an incident that needs to be
resolved by the municipality, such as a broken streetlight.

# Adds a point and an accompanying description


import arcpy

# Retrieve input parameters


inX = arcpy.GetParameterAsText(0)
inY = arcpy.GetParameterAsText(1)
inDescription = arcpy.GetParameterAsText(2)

# These parameters are hard-coded. User can't change them.


incidentsFC = "C:/Data/Yakima/Incidents.shp"
descriptionField = "DESCR"

# Create point
inPoint = arcpy.Point(inX, inY)

# Create the insert cursor and a new empty row


rowInserter = arcpy.InsertCursor(incidentsFC)
newIncident = rowInserter.newRow()

# Populate attributes of new row


newIncident.SHAPE = inPoint
newIncident.setValue(descriptionField, inDescription)

# Insert the new row into the shapefile


rowInserter.insertRow(newIncident)

# Clean up the cursor


del rowInserter

In the above example, the insert cursor is called rowInserter and the row is called
newIncident. Take a moment to ensure that you know exactly where the following
things are happening in the code:

 The creation of the insert cursor

 The creation of the new row through the newRow() method

 The assigning of geometry and attributes to the new row

 The insertion of the row through the insertRow() method

Besides creating the insert cursor, the idea of creating geometry (the point) may be
new to you. In this example, arcpy creates a new Point object whose X and Y
coordinates we can assign right at the time the point is created. Our script gets
those original X and Y coordinate values as input parameters. If this script really
were powering an interactive 311 application, the X and Y values could be derived
from a point a user clicked on the Web map.

Once you create the geometry, you have to write it to the special field that the
dataset uses to hold geometry. This field is usually called SHAPE, and for simplicity
the field name SHAPE is hard-coded in the lesson examples. If you need your code
to be bullet-proof or to work with many types of databases, you can
programmatically determine the name of the geometry field by calling the Describe
method on the feature class, then retrieving the ShapeFieldName property.

Notice that the description attribute is assigned using Row.setValue(), the same
method you used with the update cursor. Since both types of cursors are used for
writing data, setValue() can be used with both.

Inserting records using arcpy.da in ArcGIS 10.1

The arcpy data access module in ArcGIS 10.1 contains insert cursors. You can use
them with a "with" statement like you used the other cursors:

# Adds a point and an accompanying description


import arcpy

# Retrieve input parameters


inX = arcpy.GetParameterAsText(0)
inY = arcpy.GetParameterAsText(1)
inDescription = arcpy.GetParameterAsText(2)

# These parameters are hard-coded. User can't change them.


incidentsFC = "C:/Data/Yakima/Incidents.shp"
descriptionField = "DESCR"

# Make a tuple of fields to update


fieldsToUpdate = ("SHAPE@XY", "COMMENTS")

# Create the insert cursor


with arcpy.da.InsertCursor(incidentsFC, fieldsToUpdate) as cursor:
# Insert the row providing a tuple of affected attributes
cursor.insertRow(((inX,inY), inDescription))

When you insert a row using arcpy.da, you provide a comma-delimited series of
values to update. The order of these values must match the order of values of the
tuple of affected fields you provided when you created the cursor.

Another thing you might have noticed is that the string "SHAPE@XY" is used to
specify the SHAPE field. You might expect that this would just be "SHAPE," but
arcpy.da provides a list of "tokens" that you can use if the field will be specified in a
certain way. In our case, it would be very convenient just to provide the X and Y
values of the points using a tuple of coordinates. It turns out that the token
"SHAPE@XY" allows you to do just that. See help topic for InsertCursor [13] to
learn about other tokens you can use.

Putting this all together, the example creates a tuple of affected fields:
("SHAPE@XY", "COMMENTS"). When the row is inserted, the values for these
items are provided in the same order: cursor.insertRow((inX, inY),
inDescription).

Readings
Take a few minutes to read Zandbergen 7.1 - 7.3 to reinforce your learning about
cursors.

3.4 Working with rasters


So far in this lesson, your scripts have only read and edited vector datasets. This
work largely consists of cycling through tables of records and reading and writing
values to certain fields. Raster data is very different, and consists only of a series of
cells, each with its own value. So how do you access and manipulate raster data
using Python?

It's unlikely that you will ever need to cycle through a raster cell by cell on your own
using Python, and that technique is outside the scope of this course. Instead, you'll
most often use predefined tools to read and manipulate rasters. These tools have
been designed to operate on various types of rasters and perform the cell-by-cell
computations so that you don't have to.

In ArcGIS, most of the tools you'll use when working with rasters are in either the
Data Management > Raster toolset or the Spatial Analyst toolbox. These
tools can reproject, clip, mosaic, and reclassify rasters. They can calculate slope,
hillshade, and aspect rasters from DEMs.

The Spatial Analyst toolbox also contains tools for performing map algebra on
rasters. Multiplying or adding many rasters together using map algebra is
important for GIS site selection scenarios. For example, you may be trying to find
the best location for a new restaurant and you have seven criteria that must be met.
If you can create a boolean raster (containing 1 for suitable, 0 for unsuitable) for
each criteria, you can use map algebra to multiply the rasters and determine which
cells receive a score of 1, meeting all the criteria. (Alternatively you could add the
rasters together and determine which areas received a value of 7.) Other courses in
the Penn State GIS certificate program walk through these types of scenarios in
more detail.

The tricky part of map algebra is constructing the expression, which is a string
stating what the map algebra operation is supposed to do. ArcGIS Desktop contains
interfaces for constructing an expression for one-time runs of the tool. But what if
you want to run the analysis several times, or with different datasets? It's
challenging even in ModelBuilder to build a flexible expression into the map
algebra tools. With Python, you can manipulate the expression as much as you
need.

Example

Examine the following example, which takes in a minimum and maximum


elevation value as parameters, then does some map algebra with those values. The
expression isolates areas where the elevation is greater than the minimum
parameter and less than the maximum parameter. Cells that satisfy the expression
are given a value of 1 by the software, and cells that do not satisfy the expression
are given a value of 0.

But what if you don't want those 0 values cluttering your raster? This script gets rid
of the 0's by running the Reclassify tool with a real simple remap table stating that
input raster values of 1 should remain 1. Because 0 is left out of the remap table, it
gets reclassified as NoData:

# This script takes a DEM, a minimum elevation,


# and a maximum elevation. It outputs a new
# raster showing only areas that fall between
# the min and the max

import arcpy
from arcpy.sa import *
arcpy.env.overwriteOutput = True
arcpy.env.workspace = "C:/Data/Elevation"

# Get parameters of min and max elevations


inMin = arcpy.GetParameterAsText(0)
inMax = arcpy.GetParameterAsText(1)

arcpy.CheckOutExtension("Spatial")

# Perform the map algebra and make a temporary raster


inDem = Raster("foxlake")
tempRaster = (inDem > int(inMin)) & (inDem < int(inMax))

# Set up remap table and call Reclassify, leaving all values not 1 as
NODATA
remap = RemapValue([[1,1]])
remappedRaster = Reclassify(tempRaster, "Value", remap, "NODATA")

# Save the reclassified raster to disk


remappedRaster.save("foxlake_recl")

arcpy.CheckInExtension("Spatial")

Read the example above carefully, as many times as necessary for you to
understand what is occurring in each line. Notice the following things:

 There is one intermediate raster (in other words, not the final output) that

you don't want to have cluttering the output directory. This is referred to as

tempRaster in the script. You'll see this temporary raster appear in

ArcCatalog after you run the script, but it goes away after you close

PythonWin.
 Notice the expression contains > and < signs, as well as the & operator. You

have to enclose each side of the expression in parentheses to avoid confusing

the & operator.

 Because you used arcpy.GetParameterAsText() to get the input parameters,

you have to cast the input to an integer before you can do map algebra with

it. If we just used inMin, the software would see "3000," for example, and

would try to interpret that as a string. To do the numerical comparison, we

have to use int(inMin). Then the software sees the number 3000 instead of

the string "3000."

 Map algebra can perform many types of math and operations on rasters, not

limited to "greater than" or "less than." For example, you can use map

algebra to find the cosine of a raster.

 If you're working at a site where a license manager restricts the Spatial

Analyst extension to a certain number of users, you must check out the

extension in your script, then check it back in. Notice the calls to

arcpy.CheckOutExtension() and arcpy.CheckInExtension(). You can pass in

other extensions besides "Spatial" as arguments to these methods.

 Notice that the script doesn't call these spatial analyst functions using arcpy.

Instead, it imports functions from the spatial analyst module (from

arcpy.sa import *) and calls the functions directly. For example, we

don't see arcpy.Reclassify(); instead, we just call Reclassify()

directly. This can be confusing for beginners (and old pros) so be sure to

check the Esri samples closely for each tool you plan to run. Follow the

syntax in the samples and you'll usually be safe.


 See the Remap classes [14] topic in the help to understand how the remap

table in this example was created. Whenever you run Reclassify, you have to

create a remap table stating how the old values should be reclassified to new

values. This example has about the simplest remap table possible, but if you

want a more complex remap table you'll need to study the documentation.

Rasters and file extensions

The above example script doesn't use any file extensions for the rasters. This is
because the rasters use the Esri GRID format, which doesn't use extensions. If you
have rasters in another format, such as .jpg, you will need to add the correct file
extension. If you're unsure of the syntax to use when providing a raster file name,
highlight the raster in ArcCatalog and note how the path appears in the Location
bar.

If you look at rasters such as an Esri GRID in Windows Explorer, you may see that
they actually consist of several supporting files with different extensions,
sometimes even contained in a series of folders. Don't try to guess one of the files to
reference; instead, use ArcCatalog to get the path to the raster. When you use this
path, the supporting files and folders will work together automatically.

Figure 3.4 When using ArcCatalog for the path to a raster, the supporting files
and folders work together automatically.

Readings
Zandbergen chapter 9 covers a lot of additional functions you can perform with
rasters and has some good code examples. You don't have to understand everything
in this chapter, but it might give you some good ideas for your final project.

Lesson 3 Practice Exercises


Introduction
Lessons 3 and 4 contain two practice exercises each (now referred to as A and B)
that are longer than the previous practice exercises and are designed to prepare you
specifically for the projects. You should make your best attempt at each practice
exercise before looking at the solution. If you get stuck, study the solution until you
understand it.

Don't spend so much time on the practice exercises that you neglect Project 3.
However, successfully completing the practice exercises will make Project 3 much
easier and quicker for you.

Data for Lesson 3 practice exercises

The data for the Lesson 3 practice exercises is very simple and, like some of the
Project 2 practice exercise data, was derived from Washington State Department of
Transportation [15] datasets. Download the data here [16].

Using the discussion forums

Using the discussion forums is a great way to work towards figuring out the
practice exercises. You are welcome to post blocks of code on the forums relating to
these exercises.

When completing the actual Project 3, avoid posting blocks of code longer than a
few lines. If you have a question about your Project 3 code, please e-mail the
instructor, or you can post general questions to the forums that don't contain more
than a few lines of code.

Getting ready

If the practice exercises look daunting to you, you might start by practicing with
your cursors a little bit using the sample data:

 Try to loop through the CityBoundaries and print the name of each city.
 Try using an SQL expression with a search cursor to print the OBJECTIDs of

all the park and rides in Chelan county (notice there is a County field that

you could put in your SQL expression).

 Use an update cursor to find the park and ride with OBJECTID 336. Assign

it a ZIP code of 98512.

You can post thoughts on the above challenges on the forums

Lesson 3 Practice Exercise A


In this practice exercise you will programmatically select features by location and
update a field for the selected features. You'll also use your selection to perform a
calculation.

In your Lesson3PracticeExerciseA folder you have a Washington geodatabase with


two feature classes:

 CityBoundaries - Contains polygon boundaries of cities in Washington over

10 square kilometers in size.

 ParkAndRide - Contains point features representing park and ride facilities

where commuters can leave their cars.

The objective

You want to find out which cities contain park and ride facilities and what
percentage of cities have at least one facility.

 The CityBoundaries feature class has a field "HasParkAndRide," which is set

to "False" by default. Your job is to mark this field "True" for every city

containing at least one park and ride facility within its boundaries.

 Your script should also calculate the percentage of cities that have a park

and ride facility and print this figure for the user.
You do not have to make a script tool for this assignment. You can hard-code the
variable values. Try to group the hard-coded string variables at the beginning of the
script.

For the purposes of these practice exercises, assume that each point in the
ParkAndRide dataset represents one valid park and ride (ignore the value in the
TYPE field).

Tips

You can jump into the assignment at this point, or read the following tips to give
you some guidance.

 Make two feature layers: "CitiesLayer" and "ParkAndRideLayer."

 Use SelectLayerByLocation with a relationship type of "CONTAINS" to

narrow down your cities feature layer list to only the cities that contain park

and rides.

 Create an update cursor for your now narrowed-down "CitiesLayer" and

loop through each record, setting the HasParkAndRide field to "True."

 To calculate the percentage of cities with park and rides, you'll need to know

the total number of cities. You can use the GetCount [17] tool to get a total

without writing a loop. Beware that you may have to monkey around with

the output a bit to get it in a format you can use. See the example in the Esri

documentation that converts the GetCount result to an integer.

 Similarly, you may have to play around with your Python math a little to get

a nice percentage figure. Don't get too hung up on this part.

Lesson 3 Practice Exercise A


Solution
Below is one possible solution to Practice Exercise A, with comments. Although this
example is coded using a "while" loop, it could also be written using a "for" loop. If
you find a more efficient way to code a solution, please share it through the
discussion forums.

# This script determines the percentage of cities in the


# state with park and ride facilities

import arcpy
arcpy.env.overwriteOutput = True

cityBoundaries =
"C:\\Data\\Lesson3PracticeExerciseA\\Washington.gdb\\CityBoundaries"
parkAndRide =
"C:\\Data\\Lesson3PracticeExerciseA\\Washington.gdb\\ParkAndRide"
parkAndRideField = "HasParkAndRide"
citiesWithParkAndRide = 0

try:
# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

# Make a feature layer of all the cities polygons


arcpy.MakeFeatureLayer_management(cityBoundaries, "CitiesLayer")

except:
print "Could not create feature layers"

try:
# Narrow down the cities layer to only the cities that contain a park
and ride
arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS",
"ParkAndRideLayer")

# Create an update cursor


selectedCities = arcpy.UpdateCursor("CitiesLayer")
city = selectedCities.next()

# Loop through all the cities with park and rides


while city:
# Mark the HasParkAndRide field as "TRUE"
city.setValue(parkAndRideField, "TRUE")
selectedCities.updateRow(city)

# Add 1 to your tally of cities with park and rides


citiesWithParkAndRide += 1

# Repeat above process with the next city


city = selectedCities.next()

# Delete the feature layers even if there is an exception (error) raised


finally:
arcpy.Delete_management("ParkAndRideLayer")
arcpy.Delete_management("CitiesLayer")

# Clean up update cursor


del selectedCities
# Count the total number of cities (this tool saves you a loop)
numCitiesCount = arcpy.GetCount_management(cityBoundaries)
numCities = int(numCitiesCount.getOutput(0))

# Calculate the percentage and print it for the user


percentCitiesWithParkAndRide = ((1.0 * citiesWithParkAndRide) /
numCities) * 100

print str(percentCitiesWithParkAndRide) + " percent of cities have a park


and ride."

Alternate solution using the arcpy data access module in ArcGIS 10.1

The following solution shows how you could take advantage of the arcpy data
access module in ArcGIS 10.1 to solve this problem. The general approach is the
same as the other solution above.

# This script determines the percentage of cities in the


# state with park and ride facilities

import arcpy
arcpy.env.overwriteOutput = True

cityBoundaries =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseA\\Washington.gdb\\CityBoundar
ies"
parkAndRide =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseA\\Washington.gdb\\ParkAndRide
"
parkAndRideField = "HasParkAndRide"
citiesWithParkAndRide = 0

try:
# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

# Make a feature layer of all the cities polygons


arcpy.MakeFeatureLayer_management(cityBoundaries, "CitiesLayer")

except:
print "Could not create feature layers"

try:
# Narrow down the cities layer to only the cities that contain a park
and ride
arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS",
"ParkAndRideLayer")

# Create an update cursor and loop through the selected records


with arcpy.da.UpdateCursor("CitiesLayer", (parkAndRideField,)) as
cursor:
for row in cursor:
# Set the park and ride field to TRUE and keep a tally
row[0] = "True"
cursor.updateRow(row)
citiesWithParkAndRide +=1

# Delete the feature layers even if there is an exception (error) raised


finally:
arcpy.Delete_management("ParkAndRideLayer")
arcpy.Delete_management("CitiesLayer")

# Count the total number of cities (this tool saves you a loop)
numCitiesCount = arcpy.GetCount_management(cityBoundaries)
numCities = int(numCitiesCount.getOutput(0))

# Calculate the percentage and print it for the user


percentCitiesWithParkAndRide = ((1.0 * citiesWithParkAndRide) /
numCities) * 100

print str(percentCitiesWithParkAndRide) + " percent of cities have a park


and ride."

Lesson 3 Practice Exercise B


If you look in your Lesson3PracticeExerciseB folder, you'll notice the data is exactly
the same as for Practice Exercise A...except this time the field is
"HasTwoParkAndRides."

The objective

In Practice Exercise B your assignment is to find which cities have at least two park
and rides within their boundaries.

 Mark the "HasTwoParkAndRides" field as "True" for all cities that have at

least two park and rides within their boundaries.

 Calculate the percentage of cities that have at least two park and rides within

their boundaries and print this for the user.

Tips

This simple modification in requirements is a game changer. Following is one way


you can approach the task. Notice that it is very different from what you did in
Practice Exercise A:

 Create an update cursor for the cities and start a loop that will examine each

city.

 Make a feature layer with all the park and ride facilities.
 Make a feature layer for just the current city. You'll have to make an SQL

query expression in order to do this. Remember that an UpdateCursor can

get values, so you can use it to get the name of the current city.

 Use SelectLayerByLocation to find all the park and rides CONTAINED_BY

the current city. Your result will be a narrowed-down park and ride feature

layer. This is different from Practice Exercise A where you narrowed down

the cities feature layer.

 Open a search cursor on your now narrowed-down park and ride layer. Your

code might look something like: selectedParkAndRideRows =

arcpy.SearchCursor("ParkAndRideLayer"). This search cursor is searching

through all the park and rides contained by the one city boundary. If there

were no park and rides, calling next() on your search cursor would return

nothing. One way to see if there are two park and rides is to call next() twice

within an if statement (if there's one found, check for a second). If you get a

result, you know there are at least two park and rides and you can use your

update cursor to mark the row "True." You can then go on to the next city.

Another approach is to run the GetCount tool to find out how many features

were selected, then check if the result is 2 or greater.

 Be sure to delete your feature layers before you loop on to the next city. For

example: arcpy.Delete_management("ParkAndRideLayer")

 Keep a tally for every row you mark "True" and find the average as you did in

Practice Exercise A.
Lesson 3 Practice Exercise B
Solution
Below is one possible solution to Practice Exercise B, with comments. If you find a
more efficient way to code a solution, please share it through the discussion
forums.

# This script determines the percentage of cities with two park


# and ride facilities

import arcpy
arcpy.env.overwriteOutput = True

cityBoundaries =
"C:\\Data\\Lesson3PracticeExerciseB\\Washington.gdb\\CityBoundaries"
parkAndRide =
"C:\\Data\\Lesson3PracticeExerciseB\\Washington.gdb\\ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"
cityIDStringField = "CI_FIPS"
citiesWithTwoParkAndRides = 0
numCities = 0

# Make a feature layer of all the park and ride facilities


arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

# Start looping through each city


cityRows = arcpy.UpdateCursor(cityBoundaries)
city = cityRows.next()

while city:

# Create a query string for the current city


cityIDString = city.getValue(cityIDStringField)
queryString = '"' + cityIDStringField + '" = ' + "'" + cityIDString +
"'"

# Make a feature layer of just the current city polygon


arcpy.MakeFeatureLayer_management(cityBoundaries, "CurrentCityLayer",
queryString)

# Narrow down the park and ride layer by selecting only the park and
rides
# in the current city
arcpy.SelectLayerByLocation_management("ParkAndRideLayer",
"CONTAINED_BY", "CurrentCityLayer")

try:
# Try to get the first park and ride in the list
selectedParkAndRideRows = arcpy.SearchCursor("ParkAndRideLayer")
firstParkAndRide = selectedParkAndRideRows.next()

# If a first park and ride was found, look for a second one
if firstParkAndRide:
secondParkAndRide = selectedParkAndRideRows.next()

# Mark the park and ride field TRUE if a second park and ride
was found
if secondParkAndRide:
city.setValue(parkAndRideField, "TRUE")

# Don't forget to call UpdateRow


cityRows.updateRow(city)

# Add 1 to your tally of cities with two park and rides


citiesWithTwoParkAndRides += 1

# Get ready to repeat above process with the next city


numCities += 1
city = cityRows.next()

finally:
# Delete feature layer to get ready for next run of loop
arcpy.Delete_management("CurrentCityLayer")

# Clean up update cursor and feature layer containing all park and rides
del cityRows
arcpy.Delete_management("ParkAndRideLayer")

# Calculate and report the number of cities with two park and rides
percentCitiesWithParkAndRide = ((1.0 * citiesWithTwoParkAndRides) /
numCities) * 100

print str(percentCitiesWithParkAndRide) + " percent of cities have two


park and rides."

Below is an explanatory video of the solution. Note that the video was recorded
showing a slightly less efficient technique of making the "ParkAndRideLayer"
feature layer each time the loop runs. Since recording this video, I have discovered
that you only have to create ParkAndRideLayer once (before the loop, as shown
above), then the SelectLayerByLocation just performs a new selection on it each
time the loop runs.

Alternate solution using the arcpy data access module in ArcGIS 10.1

The following alternate solution uses the arcpy data access module from ArcGIS
10.1. It also takes the approach of running the GetCount tool to figure out whether
two or more park and rides were selected for each city. Also, as an improvement on
the above solution, it creates the ParkAndRideLayer just once, before the loop runs.

# This script determines the percentage of cities with two park


# and ride facilities

import arcpy
arcpy.env.overwriteOutput = True
cityBoundaries =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseB\\Washington.gdb\\CityBoundar
ies"
parkAndRide =
"D:\\Data\\Geog485\\Lesson3PracticeExerciseB\\Washington.gdb\\ParkAndRide
"
parkAndRideField = "HasTwoParkAndRides"
cityIDStringField = "CI_FIPS"
citiesWithTwoParkAndRides = 0
numCities = 0

# Make a feature layer of all the park and ride facilities


arcpy.MakeFeatureLayer_management(parkAndRide, "ParkAndRideLayer")

# Make an update cursor and loop through each city


with arcpy.da.UpdateCursor(cityBoundaries, (cityIDStringField,
parkAndRideField)) as cityRows:
for city in cityRows:
# Create a query string for the current city
cityIDString = city[0]
queryString = '"' + cityIDStringField + '" = ' + "'" +
cityIDString + "'"

# Make a feature layer of just the current city polygon


arcpy.MakeFeatureLayer_management(cityBoundaries,
"CurrentCityLayer", queryString)

try:
# Narrow down the park and ride layer by selecting only the
park and rides
# in the current city
arcpy.SelectLayerByLocation_management("ParkAndRideLayer",
"CONTAINED_BY", "CurrentCityLayer")

# Count the number of cities selected


selectedCitiesCount =
arcpy.GetCount_management("ParkAndRideLayer")
numSelectedCities = int(selectedCitiesCount.getOutput(0))

# If more the one park and ride found, update the row to TRUE
if numSelectedCities >= 2:
city[1] = "TRUE"

# Don't forget to call UpdateRow


cityRows.updateRow(city)

# Add 1 to your tally of cities with two park and rides


citiesWithTwoParkAndRides += 1

finally:
# Delete current cities layer to prepare for next run of loop
arcpy.Delete_management("CurrentCityLayer")
numCities +=1

# Clean up park and ride feature layer


arcpy.Delete_management("ParkAndRideLayer")
# Calculate and report the number of cities with two park and rides
if numCities <> 0:
percentCitiesWithParkAndRide = ((1.0 * citiesWithTwoParkAndRides) /
numCities) * 100
else:
print "Error with input dataset. No cities found."

print str(percentCitiesWithParkAndRide) + " percent of cities have two


park and rides."

Project 3: Aggregating graffiti


incidents
You've just been hired as a GIS developer in a large city and are receiving your first
Python assignment. Your co-workers in the GIS department are very excited
because they've just deployed the city's first "citizen participation" Web map. The
map is a community effort to target graffiti: anyone on the Web can go to the map
and place a point to report a graffiti incident.

The application has already been wildly successful in its first few months of
operation and your department has amassed a large amount of point data showing
graffiti incidents. However, the police chief is now interested in seeing an
aggregation of this data by patrol zones. The goal is to set a priority on each zone
and allot more resources to fighting graffiti in the high priority zones.

Download the data for this project [18]

Your task

You have a point feature class of graffiti incidents and a polygon feature class of
patrol zones with some empty attributes already created for you. You must write a
script that updates the attributes of the patrol zones with:

 The number of graffiti incidents falling within the patrol zone. This is an

integer that goes in the INCIDENTS field.

 The priority ranking for the patrol zone. This is a string that goes in the

PRIORITY field. You will derive this string using some simple math that

compares the number of incidents in the zone with the area of the zone.

Patrol zone priority rankings


You will calculate a priority ranking for each zone by dividing the number of graffiti
incidents in the zone by the area of the zone. Your script should then examine the
result and assign the appropriate priority ranking (PRIORITY). These are the
priority rankings:

 TOP CONCERN—15 or more incidents per square mile

 HIGH CONCERN— At least 12 but less than 15 incidents per square mile

 SOME CONCERN— At least 6 but less than 12 incidents per square mile

 LOW CONCERN—Fewer than 6 incidents per square mile

Deliverables

The deliverables for this project are:

 Your Python script (.py file) that performs the above tasks

 A screenshot of your patrol zones attribute table after running the script.

 A short writeup (about 300 words) describing what you learned and how

you approached the problem. If you included any "over and above" efforts,

please describe these here so the graders know to look for them.

Successful delivery of the above requirements is sufficient to earn 90% on the


project. The remaining 10% is reserved for efforts that go "over and above" the
minimum requirements. This could include (but is not limited to) useful code
comments, an especially insightful writeup explaining some lesson learned during
the coding, a map symbolizing the resulting zone priorities, or a list of other real-
world problems a similar script could solve.

Challenges

In this project, you need to manage an update cursor and perform repeated
SelectLayerByLocation operations in order to figure out how many incidents each
zone contains. You then need to use the number of incidents to calculate the
incidents per square mile, and make a decision about which priority to assign.

Approach

The approach you take for finding the number of incidents per zone should be very
similar to what you did in Lesson 3 Practice Exercise B to find the number of park
and rides inside a city boundary. This time, instead of counting whether two items
were found, you will need to get a count on the entire number of incidents selected.
This is easily done using the GetCount tool. I encourage you to spend some time
studying Practice Exercise B and its accompanying solution.

There are many ways to approach the script, and the majority of available credit
will be awarded just for solving the problem. More points will be awarded for
scripts that solve the problem in the most efficient way possible. For example, if
you can get a job done with one loop instead of two, this is more efficient. If you
loop through 10 objects instead of 100 and accomplish the same thing, this is more
efficient. Look for ways to economize what your script is doing. It is a wonderful
feeling to delete unnecessary code.

Hints

 Do the practice exercises. It is worth your time to study their solutions to the

point where you understand everything that is happening.

 Do not procrastinate. If you are new to programming, plan to spend the

majority of the allotted lesson time working on the project.

 Before you begin working, manually make a copy of the patrol areas and

place it in the same file geodatabase (name it something like

PatrolZones_Backup). Your script is going to modify the patrol zones, and as

you test, you need a convenient way to restore the data to its original state.

After each test run, you can make a new copy of the dataset. If you fail to do

this, you'll have to download the data and unzip it again each time you re-

run the script.

 Not only do the cursors place locks on your data, but the feature layers do as

well. You can call arcpy.Delete_management("TheFeatureLayer") to delete

the feature layers; however, at the time of this writing I had still not

determined how to get rid of all the locks. Be prepared to close PythonWin
and/or ArcCatalog or ArcMap if you're having trouble restoring your original

zones dataset after a test run. Save often.

 Use the debugging toolbar to help you. Set up watches on your variables and

observe how your variable values change as you step through each line of

your code. This is what full-time programmers do during a vast chunk of

their time.

 File geodatabases always have an area field called SHAPE_Area. Use this to

get the initial area, which for this dataset will be in square meters. Divide (/)

the number of square meters by 2589988.11 to get square miles.

 If you get stuck or burned out, you may want to take a break and read

through the lesson again, front to back. The lesson material will take on a

whole new meaning once you have actually tried to write some code using

the described techniques, and you may read something that will help you get

past your brick wall. Also, the ArcGIS Help topics linked to in the lesson are

very helpful for this assignment, especially the ones about making feature

layers and selecting data.

Note: Although I did have a pre-college job painting over graffiti in this geographic
area, the data for this project is completely fabricated and does not represent actual
graffiti incidents or police patrol zones.

Author(s) and/or Instructor(s): Sterling Quinn, John A. Dutton e-Education


Institute, College of Earth and Mineral Sciences, The Pennsylvania State
University;
Jim Detwiler, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University;
Frank Hardisty, John A. Dutton e-Education Institute, College of Earth and
Mineral Sciences, The Pennsylvania State University;
James O'Brien, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University
Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program

© 1999-2012 The Pennsylvania State University. Except where otherwise noted,


this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.

Please address questions and comments about this resource to the site editor.

Source URL: https://www.e-education.psu.edu/geog485/node/59

Links:
[1]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s50000000n000000
.htm
[2] http://en.wikipedia.org/wiki/Shapefile
[3]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Using_fields_an
d_indexes/002z00000019000000/
[4]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s50000002t000000.
htm
[5]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s5/00s500000033000000
.htm
[6]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0017/00170000006p000000
.htm
[7]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/0017/001700000072000000
.htm
[8]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002z/002z0000001q000000
.htm
[9]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000003800000
0.htm
[10]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000003900000
0.htm
[11]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000003m00000
0.htm
[12] http://en.wikipedia.org/wiki/3-1-1
[13]
http://resources.arcgis.com/en/help/main/10.1/018w/018w0000000t000000.ht
m
[14]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/005m/005m0000001s00000
0.htm
[15] http://www.wsdot.wa.gov/mapsdata/GeoDataCatalog/default.htm
[16] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/Lesson3PracticeExercises.zip
[17]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//0017000000n7
000000.htm
[18] https://www.e-education.psu.edu/drupal6/files/geog485py/data/Project3.zip
Final Project proposal assignment
At some point during this course you've hopefully felt "the lightbulb go on"
regarding how you might apply the lesson material to your own tasks in the GIS
workplace. To conclude this course, you will be expected to complete an individual
project that uses Python automation to make some GIS task easier, faster, or more
accurate.

The project goal is up to you, but it is preferrably one that relates to your current
field of work or a field in which you have a personal interest. Since you're defining
the requirements from the beginning, there is no "over and above" credit factored
into this project grade. The number of lines of code you write is not as important as
the problem you solved. However, we encourage you to propose a project that
meets or even slightly exceeds your relative level of experience with programming.

You will have two weeks at the end of the quarter to dedicate completely toward the
project and the Review Quiz. This is your chance to apply what you've learned
about Python to a problem that really interests you.

One week into Lesson 4 you are required to submit a project proposal to the Final
Project Proposal Drop Box in ANGEL. This proposal must clearly explain:

1. The task you intend to accomplish using Python

2. How your proposed solution will make the task easier, faster, and/or more

accurate. Also explain why your task could not simply be accomplished using

the "out-of-the-box" tools from Esri, or why your script gives a particular

advantage over those tools.

3. The deliverables you will submit for the project. A well-documented script

tool is highly encouraged. If the script requires data, describe how the

instructors will be able to evaluate your script. Possible solutions are to zip a

sample dataset for the instructors, demonstrate your script during an Adobe

Connect session, or make the script flexible enough that it could be used

with any dataset.


The proposal will contribute toward 10% of your Final Project grade, and will be
used to help grade the rest of your project. Your proposal must be approved by the
instructors before you move forward with coding the project. We may also offer
some guidance on how to approach your particular task, and we'll provide thoughts
on whether you are taking on too much or too little work to be successful.

As you work on your project, you're encouraged to seek help from all resources
discussed in this class, including existing code samples and scripts on the Internet.
If you re-use any long sections of code that you found on the Internet, please
thoroughly explain in your project writeup how you found it, tested it, and
extracted only the parts you needed.

Project ideas

If you're having trouble thinking up a project, you can derive a proposal from one
of the suggestions here. You may have to spend a little bit of time acquiring or
making up some test datasets to fit these project ideas. I also suggest that you read
through the Lesson 4 material before selecting a project, just so you have a better
idea of what types of things are possible with Python.

 Compare dataset statistics: Make a tool or script that takes two feature

classes as input, along with a field name. The tool should check whether the

field is numeric and exists in both feature classes. If both these conditions

are met, the tool should calculate statistics for that field for both feature

classes and report the difference. Statistics could be sum, average, standard

deviation (if you are feeling brave), etc.

 Compare existence of features in two datasets: Make a tool or script

that reads two feature classes based on a key field (such as OBJECTID). The

tool should figure out which features only appear in one of the feature

classes and write them to a third feature class. As a variation on this, the tool

could figure out which features appear in both feature classes and write

them to a third feature class. You could even allow the tool user to set a

parameter to determine this.


 Calculate and compare areas: Make a tool or script that tallies the areas

of all geometries in a feature class, or subsets of geometries based on a query

and reports the difference. For example, this tool might compare "Acres of

privately owned wetlands in 2008" and "Acres of privately owned wetlands

in 2009."

 Find and replace: Make a tool flexible enough to search for any term in

any field in a feature class and replace it with another user-provided term.

Ensure in your code that users cannot modify the critical fields' OBJECTIDs

or SHAPEs. Also ensure that partial strings are supported, such that if the

search term is found anywhere within a string, it will be replaced while

leaving the rest of the string intact.

 Parse KML, XML, or JSON and write to a feature class: Make a tool

or script that reads a KML file, or an XML or JSON response from a Web

service, and writes the geometries to a feature class. (You'll get some

exposure to reading text-based files in Lesson 4.)

 Concatenate name fields: Write a tool or script that takes a feature class

as input, as well as "First name," "Middle name," and "Last name"

parameters that represent fields in the feature class. Your tool should add a

new field for each record that contains the first, middle, and last names

separated by one space. Your tool should intuitively handle blank records

and records that have no middle name.

 Process rasters to meet an organizational need: Write a tool or script

that takes a raw elevation dataset (such as a DEM), clips it to your study

area, creates both hillshade and slope rasters, and projects them into your
organization's most commonly used projection. Expose the study area

feature class to the end user as a parameter.

 Parse raw textual data and write to feature class: Find some data

available on the Internet that has lat/lon locations but is in text-based

format with no Esri feature class available (for example, weather station

readings or GPS tracks). If you need to, you can copy the HTML out of the

Web page and paste it in a .txt file to help you get started. Read the .txt file

and write the data to a new feature class. (You'll get some exposure to

reading text-based files in Lesson 4.)

 Make an MXD repair tool: Make a tool that takes an old and new

workspace path as inputs and then repairs all the broken data links in an

MXD. (You can do this using the arcpy.mapping module described in Lesson

4.)

 Make a "map book": Make a tool that opens a series of MXDs, data

frames, or map extents and constructs a multi-page PDF from them (You

can do this using the arcpy.mapping module described in Lesson 4.)

Author(s) and/or Instructor(s): Sterling Quinn, John A. Dutton e-Education


Institute, College of Earth and Mineral Sciences, The Pennsylvania State
University;
Jim Detwiler, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University;
Frank Hardisty, John A. Dutton e-Education Institute, College of Earth and
Mineral Sciences, The Pennsylvania State University;
James O'Brien, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University

Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program
© 1999-2012 The Pennsylvania State University. Except where otherwise noted,
this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.

Please address questions and comments about this resource to the site editor.

Source URL: https://www.e-education.psu.edu/geog485/node/149


Lesson 4: Practical Python for the
GIS analyst
Lesson 4 contains a variety of subjects to help you use Python more effectively as a
GIS analyst. The sections of this lesson will reinforce what you've learned already,
while introducing some new concepts that will help take your automation to the
next level.

You'll learn now to modularize a section of code to make it usable in multiple


places. You'll learn how to use new Python modules, such as os, to open and read
files; then you'll transfer the information in those files into geographic datasets that
can be read by ArcGIS. Finally, you'll learn how to use your operating system to
automatically run Python scripts at any time of day.

Lesson 4 checklist
Lesson 4 explores some more advanced Python concepts, including reading and
parsing text. To complete Lesson 4, do the following:

1. One week into the lesson, submit your Final Project proposal [1] to the

instructors using the ANGEL e-mail system. For the exact due date, see the

Calendar tab in ANGEL.

2. Work through the course lesson materials.

3. Read Zandbergen chapters 7.6, 8.1 - 8.6, 10, and 12.1 - 12.5. In the online

lesson pages I have inserted instructions about when it is most appropriate

to read each of these chapters.

4. Complete Project 4 and submit your zipped deliverables to the Project 4

drop box.

5. Complete the Lesson 4 Quiz.

4.1 Functions and modules


One of the fundamentals of programming that we did not previously cover is
functions. To start this lesson, we'll talk about functions and how you can use them
to your benefit as you begin writing longer scripts.

A function contains one focused piece of functionality in a reusable section of code.


The idea is that you write the function once, then use, or call, it throughout your
code whenever you need to. You can put a group of related functions in a module so
you can use them in many different scripts. When used appropriately, functions
eliminate code repetition and make the main body of your script shorter and more
readable.

Functions exist in many programming languages, and each has its way of defining a
function. In Python, you define a function using the def statement. Each line in the
function that follows the def is indented. Here's a simple function that reads the
radius of a circle and reports the circle's approximate area. (Remember that the
area is equal to pi [3.14159...] multiplied by the square [** 2] of the radius.)

>>> def findArea(radius):


... area = 3.14159 * radius ** 2
... return area
...
>>> findArea(3)
28.27431

Notice from the above example that functions can take parameters, or arguments.
When you call the above function, you supply the radius of the circle in
parentheses. The function returns the area (notice the return statement, which is
new to you).

Thus, to find the area of a circle with a radius of 3 inches, you could make the
function call findArea(3) and get the return value 28.27431 (inches).

It's common to assign the returned value to a variable and use it later in your code.
For example, you could add these lines in the Interactive Window:

>>> aLargerCircle = findArea(4)


>>> print aLargerCircle
50.26544

A function is not required to return any value. For example, you may have a
function that takes the path of a text file as a parameter, reads the first line of the
file, and prints that line to the Interactive Window. Since all the printing logic is
performed inside the function, there is really no return value.

Neither is a function required to take a parameter. For example, you might write a
function that retrieves or calculates some static value. Try this in the Interactive
Window:

>>> def getCurrentPresident():


... return "Barack Obama"
...
>>> president = getCurrentPresident()
>>> print president
Barack Obama

The function getCurrentPresident() doesn't take any user-supplied parameters. Its


only "purpose in life" is to return the name of the current president. It cannot be
asked to do anything else.

Modules

You may be wondering what advantage you gain by putting the above
getCurrentPresident() logic in a function. Why couldn't you just define a string
currentPresident and set it equal to "Barack Obama?" The big reason is reusability.

Suppose you maintain 20 different scripts, each of which works with the name of
the current President in some way. You know that the name of the current
President will eventually change. Therefore, you could put this function in what's
known as a module file and reference that file inside your 20 different scripts.
When the name of the President changes, you don't have to open 20 scripts and
change them. Instead, you just open the module file and make the change once.

You may remember that you've already worked with some of Python's built-in
modules. The Hi Ho! Cherry O example in Lesson 2 imported the random module
so that the script could generate a random number for the spinner result. This
spared you the effort of writing or pasting any random number generating code
into your script.

You've also probably gotten used to the pattern of importing the arcpy site package
at the beginning of your scripts. A site package can contain numerous modules. In
the case of arcpy, these modules include Esri functions for geoprocessing.

As you use Python in your GIS work, you'll probably write functions that are useful
in many types of scripts. These functions might convert a coordinate from one
projection to another, or create a polygon from a list of coordinates. These
functions are perfect candidates for modules. If you ever want to improve on your
code, you can make the change once in your module instead of finding each script
where you duplicated the code.

Creating a module

To create a module, create a new script in PythonWin and save it with the standard
.py extension; but instead of writing start-to-finish scripting logic, just write some
functions. Here's what a simple module file might look like. This module only
contains one function, which adds a set of points to a feature class given a Python
list of coordinates.
# This module is saved as practiceModule1.py

# The function below creates points from a list of coordinates


# Example list: [[-113,23][-120,36][-116,-2]]]

def createPoints(coordinateList, featureClass):

# Import arcpy and create an insert cursor


import arcpy
rowInserter = arcpy.InsertCursor(featureClass)

# Loop through each coordinate in the list


for coordinate in coordinateList:

# Grab a set of coordinates from the list and


# assign them to a point object
x = float(coordinate[0])
y = float(coordinate[1])
pointGeometry = arcpy.Point(x,y)

# Use the insert cursor to put the point object


# in the feature class
newPoint = rowInserter.newRow()
newPoint.Shape = pointGeometry
rowInserter.insertRow(newPoint)

# Delete the insert cursor


del rowInserter

(Note that if you're using ArcGIS 10.1 with the data access module arcpy.da, you
could write it like the following:)

def createPoints(coordinateList, featureClass):

# Import arcpy and create an insert cursor


import arcpy

with arcpy.da.InsertCursor(featureClass, ("SHAPE@",)) as rowInserter:

# Loop through each coordinate in the list and make a point


for coordinate in coordinateList:
point = arcpy.Point(coordinate[0],coordinate[1])
rowInserter.insertRow((point,))

The above function createPoints could be useful in various scripts, so it's very
appropriate for putting in a module. Notice that this script has to work with insert
cursors and point objects, so it requires arcpy. It's legal to import a site package or
module within a module.

Also notice that arcpy is imported within the function, not at the very top of the
module like you are accustomed to seeing. This is done for performance reasons.
You may add more functions to this module later that do not require arcpy. You
should only do the work of importing arcpy when necessary, that is, if a function is
called that requires it.

The arcpy site package is only available inside the scope of this function. If other
functions in your practice module were called, the arcpy module would not be
available to those functions. Scope applies also to variables that you create in this
function, such as rowInserter. Scope can be further limited by loops that you put in
your function. The variable pointGeometry is only valid inside the for loop inside
this particular function. If you tried to use it elsewhere, it would be out of scope and
unavailable.

Using a module

So how could you use the above module in a script? Imagine that the module above
is saved on its own as practiceModule1.py. Below is an example of a separate script
that imports practiceModule1.

# This script is saved as add_my_points.py

# Import the module containing a function we want to call


import practiceModule1

# Define point list and shapefile to edit


myWorldLocations = [[-123.9,47.0],[-118.2,34.1],[-112.7,40.2],[-63.2,-
38.7]]
myWorldFeatureClass = "c:\\Data\\WorldPoints.shp"

# Call the createPoints function from practiceModule1


practiceModule1.createPoints(myWorldLocations, myWorldFeatureClass)

The above script is simple and easy to read because you didn't have to include all
the logic for creating the points. That is taken care of by the createPoints function
in the module you imported, practiceModule1. Notice that to call a function from a
module, you need to use the syntax module.function().

Readings

To reinforce the material in this section, read Zandbergen 12.1 - 12.5, which talks
about creating Python functions and modules.

Practice

Before moving ahead, get some practice in PythonWin by trying to write the
following functions. These functions are not graded, but the experience of writing
them will help you in Project 4. Use the course forums to help each other.

 A function that returns the perimeter of a square given the length of one

side.
 A function that takes a path to a feature class as a parameter and returns a

Python list of the fields in that feature class. Practice calling the function and

printing the list. However, do not print the list within the function.

 A function that returns the Euclidean distance between any two coordinates.

The coordinates can be supplied as parameters in the form (x1, y1, x2, y2).

For example, if your coordinates were (312088, 60271) and (312606,

59468), your function call might look like this: findDistance(312088, 60271,

312606, 59468). Use the Pythagorean formula A ** 2 + B ** 2 = C ** 2. For

an extra challenge, see if you can handle negative coordinates.

The best practice is to put your functions inside a module and see if you can
successfully call them from a separate script. If you try to step through your code
using the debugger, you'll notice that the debugger helpfully moves back and forth
between the script and the module whenever you call a function in the module.

4.2 Reading and parsing text


One of the best ways to increase your effectiveness as a GIS programmer is to learn
how to manipulate text-based information. In Lesson 3, we talked about how to
read data in ArcGIS's native formats, such as feature classes. But often GIS data is
collected and shared in more "raw" formats such as an Excel spreadsheet in CSV
(comma-separated value) format, a list of coordinates in a text file, or an XML [2]
response received through a Web service.

When faced with these files, you should first understand if your GIS software
already comes with a tool or script that can read or convert the data to a format it
can use. If no tool or script exists, you'll need to do some programmatic work to
read the file and separate out the pieces of text that you really need. This is called
parsing the text.

For example, a Web service may return you many lines of XML describing all the
readings at a weather station, when all you're really interested in are the
coordinates of the weather station and the annual average temperature. Parsing the
response involves writing some code to read through the lines and tags in the XML
and isolating only those three values.
When you parse, you cycle through lines of text, treating them as strings, and pull
out the useful information from those strings. In an XML file, for example, you may
know that the information you want falls inside a particular tag, such as
<AvgTemp>46</AvgTemp>. One approach to getting the value 46 might be to
search for the line containing the substring "AvgTemp," then take all the characters
that fall between the first > coming from the left of the string and the first < coming
from the right.

In another case, you might know that the values you want fall after the second and
third commas in a line of comma-separated values. You can split up the line based
on comma locations and put all the segments of the string in a Python list. You can
then take the third and fourth items in the list to get the values you want.
(Remember the third and fourth items would come after the second and third
commas, respectively.)

In both cases, the keys to effective parsing are to know how to read lines in a file
and know your string manipulation methods. It's helpful to know how to read a
string, search for values in a string, split up a string based on some delimeter, and
extract particular segments of a string.

Sometimes you can import helper modules, or libraries, into your code to make it
easier to parse certain types of text. In the XML example, it may be easier to import
xml.dom (described here in Chapter 1 of the book Python & XML [3]), which puts
all the XML elements in the file into a series of lists. Searching through those lists is
easier than repeatedly searching for the < and > characters. If you're preparing for
a big parsing project with XML or some other type of well-known format, it may be
worth your while to investigate whether there's a third-party library that can help
you.

There are an infinite number of parsing scenarios that you can encounter. This
lesson will attempt to teach you the general approach by walking through just one
example. In your final project for this course, you may choose to explore parsing
other types of files.

Introducing the GPS track parsing example

This example reads a text file collected from a GPS unit. The lines in the file
represent readings taken from the GPS unit as the user traveled along a path. In
this section of the lesson, you'll learn one way to parse out the coordinates from
each reading. The next section of the lesson uses a variation of this example to
show how you could write the user's track to a polyline feature class.

The file for this example is called gps_track.txt and it looks something like the text
string shown below. (Please note, line breaks have been added to the file shown
below to ensure that the text fits within the page margins. Click on this link to the
gps track.txt file [4] to see what the text file actually looks like.)
type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,te
mp,time,model,filename,ltime
TRACK,ACTIVE LOG,40.78966141,-
77.85948515,4627251.76270444,1779451.21349775,True,False,
255,358.228393554688,0,0,2008/06/11-14:08:30,eTrex Venture,
,2008/06/11 09:08:30
TRACK,ACTIVE LOG,40.78963995,-
77.85954952,4627248.40489401,1779446.18060893,False,False,
255,358.228393554688,0,0,2008/06/11-14:09:43,eTrex Venture,
,2008/06/11 09:09:43
TRACK,ACTIVE LOG,40.78961849,-
77.85957098,4627245.69008772,1779444.78476531,False,False,
255,357.747802734375,0,0,2008/06/11-14:09:44,eTrex Venture,
,2008/06/11 09:09:44
TRACK,ACTIVE LOG,40.78953266,-
77.85965681,4627234.83213242,1779439.20202706,False,False,
255,353.421875,0,0,2008/06/11-14:10:18,eTrex Venture, ,2008/06/11
09:10:18
TRACK,ACTIVE LOG,40.78957558,-
77.85972118,4627238.65402635,1779432.89982442,False,False,
255,356.786376953125,0,0,2008/06/11-14:11:57,eTrex Venture,
,2008/06/11 09:11:57
TRACK,ACTIVE LOG,40.78968287,-
77.85976410,4627249.97592111,1779427.14663093,False,False,
255,354.383178710938,0,0,2008/06/11-14:12:18,eTrex Venture,
,2008/06/11 09:12:18
TRACK,ACTIVE LOG,40.78979015,-
77.85961390,4627264.19055204,1779437.76243578,False,False,
255,351.499145507813,0,0,2008/06/11-14:12:50,eTrex Venture,
,2008/06/11 09:12:50
etc. ...

Notice that the file starts with a header line, explaining the meaning of the values
contained in the readings from the GPS unit. Each subsequent line contains one
reading. The goal for this example is to create a Python list containing the X,Y
coordinates from each reading. Specifically, the script should be able to read the
above file and print a text string like the one shown below.

[['-77.85948515', '40.78966141'], ['-77.85954952', '40.78963995'], ['-


77.85957098', '40.78961849'], etc.]

Approach for parsing the GPS track

Before you start parsing a file, it's helpful to outline what you're going to do and
break up the task into manageable chunks. Here's some pseudocode for the
approach we'll take in this example:

1. Open the file.

2. Read the header line.


3. Loop through the header line to find the index positions of the "lat" and

"long" values.

4. Read the rest of the lines.

5. Split each line into a list of values, using the comma as a delimiter.

6. Find the values in the list that correspond to the lat and long coordinates

and write them to a new list.

Opening the file

The first thing the script needs to do is open the file. Python contains a built-in
open() [5] method for doing this. The parameters for this method are the path to
the file and the mode in which you want to open the file (read, write, etc.). In this
example, "r" stands for read-only mode.

gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")

Reading the header line

Opening the file with the open() method gets you a file object (called gpsTrack in
our case). You can read the first line by calling the file.readline() [6] method, like
this:

headerLine = gpsTrack.readline()

This returns the string


"type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,temp,time,
model,filename,ltime".

Looping through the header line to find the index positions of the the
"lat" and "long" values

You need to search through this string and find the position of "lat" and "long." If
you start counting comma-separated values in this string beginning from zero, it's
easy to see that "lat" is at index position 2 and "long" is at index position 3.
However, it's a good practice not to hard-code numbers like 2 and 3 into your
script. Hard-coded numbers other than 0 or 1 are sometimes derided as magic
numbers, suggesting that if you're not the programmer, you might have to use
magic to know where the numbers came from!

Avoiding magic numbers gives you greater flexibility. If you wanted to re-use this
script with a file in which "lat" and "long" were in different positions, you wouldn't
have to modify your code. Even if "lat" and "long" went by some other name, it
would be easier to find and change a string in your script instead of finding and
changing a "magic number".

So how can you programmatically determine that "lat" is at index 2 and "long" is at
index 3? Below is one way that uses the string.split() [7] method. This method puts
each "item" in the line into a list. The parameter you pass to the split method
determines the delimiter, or the character that determines a new list item. In our
case, it's the comma:

valueList = headerLine.split(",")

The above method call returns: ['type', 'ident', 'lat', 'long', 'y_proj', 'x_proj',
'new_seg', 'display', 'color', 'altitude', 'depth', 'temp', 'time', 'model', 'filename',
'ltime']. The key is that now you can cycle through this list and discover the position
of "lat" and "long." To do this, you could write a loop that searched through the list
for "lat" and "long," but a quicker way is to use the helper method index() that gets
you the index position of any item in the list:

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")

After running the above lines, latValueIndex is equal to 2 and lonValueIndex is


equal to 3. With those variables set, you're now ready to start reading the rest of the
lines in the file.

Processing the rest of the lines in the file

When you have an open text file, you can always call file.readline() to go to the next
line. In our case, we know we're going to use all the rest of the lines in the file, so
it's more efficient to call file.readlines() [8] to read them all at once. (This might
not be efficient with an extremely long file.) The readlines() method returns a list of
all the remaining lines in the file.

Now you can cycle through each GPS reading and split it up based on commas the
same way you split up the header. You specifically need to pull out the values in
index positions 2 and 3 of your list (represented by latValueIndex and
lonValueIndex, respectively) and write those to a new list (coordList).

# Read lines in the file and append to coordinate list


coordList = []

for line in gpsTrack.readlines():


segmentedLine = line.split(",")
coordList.append([segmentedLine[lonValueIndex],
segmentedLine[latValueIndex]])

print coordList
Note a few important things about the above code:

 coordList actually contains a bunch of small lists within a big list. Each small

list is a coordinate pair representing the x (longitude) and y (latitude)

location of one GPS reading.

 The list.append() method is used to add items to coordList. Notice again that

you can append a list itself (representing the coordinate pair) using this

method.

Full code for the example

Here's the full code for the example. Feel free to download the text file [4] and try it
out on your computer.

# Reads a GPS-produced text file and writes the lat and long values
# to a list of coordinates
gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")

# Figure out position of lat and long in the header


headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")

# Read lines in the file and append to coordinate list


coordList = []

for line in gpsTrack.readlines():


segmentedLine = line.split(",")
coordList.append([segmentedLine[lonValueIndex],
segmentedLine[latValueIndex]])

print coordList

Applications of this script

You might be asking at this point, "What good does this list of coordinates do for
me?" Admittedly, the data is still very "raw." It cannot be read directly in this state
by a GIS. However, having the coordinates in a Python list makes them easy to get
into other formats that can be visualized. For example, these coordinates could be
written to points in a feature class, or vertices in a polyline or polygon feature class.
The list of points could also be sent to a Web service for reverse geocoding, or
finding the address associated with each point. The points could also be plotted on
top of a Web map using programming tools like the Google Maps API. Or, if you
were feeling really ambitious, you might use Python to write a new file in KML
format, which could be viewed in 3D in Google Earth.

Summary

Parsing any piece of text requires you to be familiar with file opening and reading
methods, the structure of the text you're going to parse, and string manipulation
methods. In the preceding example, we parsed a simple text file, extracting
coordinates collected by a handheld GPS unit. We used the string.split() method to
break up each GPS reading and find the latitude and longitude values. In the next
section of the lesson, you'll learn how you could do more with this information by
writing the coordinates to a polyline dataset.

As you use Python in your GIS work, you could encounter a variety of parsing tasks.
As you approach these, don't be afraid to seek help from Internet examples, code
reference topics such as the ones linked to in this lesson, and your textbook.

Readings

It's worth your time to read Zandbergen 7.6, which talks about parsing text files.
Any examples you can pick up with text parsing will help you when you encounter a
new file that you need to read. You'll have this experience in the practice exercises
and projects this week.

4.3 Writing geometries


As you parse out geographic information from "raw" sources such as text files, you
may want to convert it to a format that is native to your GIS. This section of the
lesson discusses how to write vector geometries to ArcGIS feature classes. We'll
read through the same GPS-produced text file from the previous section, but this
time we'll add the extra step of writing each coordinate to a polyline shapefile.

You've already had some experience writing point geometries when we learned
about insert cursors. To review, you use arcpy.Point() to create a Point object, then
you use an insert cursor to assign it to the geometry field of the feature class (called
"shape" for shapefiles).

# Create point
inPoint = arcpy.Point(-121.34, 47.1)

# newRow originates from an insert cursor


newRow.shape = inPoint

For polylines and polygons, you create multiple Point objects that you add to an
Array object. Then you make a Polyline or Polygon object using the array. With
polygons it's a good practice to make the end vertex the same as the start vertex if
possible.

The code below creates an empty array and adds three points using the Array.add()
method. Then the array is used to create a Polyline object.

The first parameter you pass in when creating a polyline is the array containing the
points for the polyline. The second parameter is a spatial reference of the
coordinates, which you should always pass in to ensure that the precision of your
data is maintained.

# Make a new empty array


array = arcpy.Array()

# Make some points


point1 = arcpy.Point(-121.34,47.1)
point2 = arcpy.Point(-121.29,47.32)
point3 = arcpy.Point(-121.31,47.02)

# Put the points in the array


array.add(point1)
array.add(point2)
array.add(point3)

# Make a polyline out of the now-complete array


polyline = arcpy.Polyline(array, spatialRef)

# Put the polyline in the feature class


newRow.shape = polyline

Of course, you usually won't create points manually in your code like this with
hard-coded coordinates. It's more likely that you'll parse out the coordinates from a
file or capture them from some external source, such as a series of mouse clicks on
the screen.

Creating a polyline from a GPS track

Here's how you could parse out coordinates from a GPS-created text file like the
one in the previous section of the lesson. This code reads all the points captured by
the GPS and adds them to one long polyline. The polyline is then written to an
empty, pre-existing polyline shapefile with a geographic coordinate system named
tracklines.shp. If you didn't have a shapefile already on disk, you could use the
Create Feature Class tool to create one with your script.

# Reads a GPS-produced text file and writes the lat and long values
# to an already-created polyline shapefile
import arcpy

# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("C:\\Data\\GPS\\gps_track.txt", "r")
polylineFC = "C:\\Data\\GPS\\tracklines.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference

# Figure out position of lat and long in the header


headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")

# Create an array to store the points for the polyline


vertexArray = arcpy.Array()

# Read each line in the file


for line in gpsTrack.readlines():
segmentedLine = line.split(",")

# Get the lat/lon values of the current GPS reading


latValue = segmentedLine[latValueIndex]
lonValue = segmentedLine[lonValueIndex]

# Create a point and add it to the array


vertex = arcpy.Point(lonValue, latValue)
vertexArray.add(vertex)

# Create an insert cursor


cursor = arcpy.InsertCursor(polylineFC)
feature = cursor.newRow()

# Put the array in a polyline and write it to the feature class


polyline = arcpy.Polyline(vertexArray, spatialRef)
feature.shape = polyline

cursor.insertRow(feature)

del cursor

The above script starts out the same as the one in the previous section of the lesson.
First, it parses the header line of the file to determine the position of the latitude
and longitude coordinates in each reading. But then, notice that an array is created
to hold the points for the polyline:

vertexArray = arcpy.Array()

After that, a loop is initiated that reads each line and creates a point object from the
latitude and longitude values. At the end of the loop, the point is added to the array.

for line in gpsTrack.readlines():


segmentedLine = line.split(",")

# Get the lat/lon values of the current GPS reading


latValue = segmentedLine[latValueIndex]
lonValue = segmentedLine[lonValueIndex]

# Create a point and add it to the array


vertex = arcpy.Point(lonValue, latValue)
vertexArray.add(vertex)

Once all the lines have been read, the loop exits and an insert cursor is created. The
cursor is used to create a new row. Then a Polyline object is created and assigned to
the shape field, thereby giving the row some geometry.

# Create an insert cursor


cursor = arcpy.InsertCursor(polylineFC)
feature = cursor.newRow()

# Put the array in a polyline and write it to the feature class


polyline = arcpy.Polyline(vertexArray, spatialRef)
feature.shape = polyline

cursor.insertRow(feature)

del cursor

Remember that the cursor places a lock on your dataset, so this script doesn't
create the cursor until absolutely necessary (in other words, after the loop). After
the row is inserted, the cursor is deleted to remove the lock.

Extending the example for multiple polylines

Just for fun, suppose your GPS allows you to mark the start and stop of different
tracks. How would you handle this in the code? You can download this modified
text file with multiple tracks [9] if you want to try out the following example.

Notice that in the GPS text file, there is an entry new_seg:

type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,te
mp,time,model,filename,ltime

new_seg is a boolean property that determines whether the reading begins a new
track. If new_seg = true, you need to write the existing polyline to the shapefile and
start creating a new one. Take a close look at this code example and notice how it
differs from the previous one in order to handle multiple polylines:

# Reads a GPS-produced text file and writes the lat and long values
# to an already-created polyline shapefile. Handles multiple polylines.

# Function to add a completed single part polyline to the feature class


def addPolyline(cursor, array, sr):
feature = cursor.newRow()
polyline = arcpy.Polyline(array, sr)
feature.shape = polyline
cursor.insertRow(feature)
array.removeAll()

# Main script body


import arcpy
# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("C://Data//GPS//gps_track_multiple.txt", "r")
polylineFC = "C://Data//GPS//tracklines_sept25.shp"
spatialRef = arcpy.Describe(polylineFC).spatialReference

# Figure out position of lat and long in the header


headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
newTrackIndex = valueList.index("new_seg")

# Read lines in the file and append to coordinate list


cursor = arcpy.InsertCursor(polylineFC)
vertexArray = arcpy.Array()

# Read each line and split it


for line in gpsTrack.readlines():
segmentedLine = line.split(",")
isNew = segmentedLine[newTrackIndex].upper()

# If starting a new line, write the completed


# line to the feature class
if isNew == "TRUE":

# This check is needed to handle the first GPS entry


if vertexArray.count > 0:
addPolyline(cursor, vertexArray, spatialRef)

# Get the lat/lon values of the current GPS reading


latValue = segmentedLine[latValueIndex]
lonValue = segmentedLine[lonValueIndex]

vertex = arcpy.Point(lonValue, latValue)


vertexArray.add(vertex)

# Add the final polyline to the shapefile


addPolyline(cursor, vertexArray, spatialRef)

del cursor

The first thing you should notice is that this script uses a function. The addPolyline
function adds a polyline to a feature class, given three parameters: (1) an existing
insert cursor, (2) an array, and (3) a spatial reference. This function cuts down on
repeated code and makes the script more readable.

Here's a look at the addPolyline function:

def addPolyline(cursor, array, sr):


feature = cursor.newRow()
polyline = arcpy.Polyline(array, sr)
feature.shape = polyline
cursor.insertRow(feature)
array.removeAll()

Notice it's okay to use arcpy in the above function, since it is going inside the body
of a script that imports arcpy. However, you want to avoid using variables in the
function that are not defined within the function or passed in as parameters.

The addPolyline function is called twice in the script: once within the loop, which
we would expect, and once at the end to make sure the final polyline is added to the
shapefile. This is where writing a function cuts down on repeated code.

As you read each line of the text file, how do you determine whether it begins a new
track? First of all, notice that we've added one more value to look for in this script:

newTrackIndex = valueList.index("new_seg")

The variable newTrackIndex shows us which position in the line is held by the
boolean new_seg property that tells us whether a new polyline is beginning. If you
have sharp eyes, you'll notice we check for this later in the code:

segmentedLine = line.split(",")
isNew = segmentedLine[newTrackIndex].upper()

# If starting a new line, write the completed


# line to the feature class
if isNew == "TRUE":

In the above code, the upper() method converts the string into all upper-case, so we
don't have to worry about whether the line says "true," "True," or "TRUE." But
there's another situation we have to handle: What about the first line of the file?
This line should read "true," but we can't add the existing polyline to the file at that
time, because there isn't one yet. Notice that a second check is performed to make
sure there are more than zero points in the array before the array is written to the
shapefile:

# Need this > 0 check to handle the first track


if vertexArray.count > 0:
addPolyline(cursor, vertexArray, spatialRef)

The above code checks to make sure there's at least one point in the array, then it
calls the addPolyline function, passing in the cursor and the array.

Here's another question to consider: How did we know that the Array object has a
count property that tells us how many items are in it? This comes from the ArcGIS
Desktop Help topic describing the Array class [10]. In this section of the help there
are topics describing each class in arcpy, and you'll come here often if you work
with ArcGIS geometries in Python.

In the above-linked Array topic, find the Properties table in this topic and notice
that Array has a read-only count property. If we were working with a Python list,
we could use len(vertexArray), but in our case vertexArray is an Array object that is
native to the ArcGIS geoprocessing programming model. This means it is a
specialized object designed by Esri, and you can only learn its methods and
properties by examining the documentation. Bookmark these pages!

The GPS parsing example using ArcGIS 10.1 and the arcpy data access
module

For reference only, below is how you could write the above script using ArcGIS 10.1
and the data access module arcpy.da. This example handles multiple polylines in
the file. The syntax ("SHAPE@",) is a tuple with one item, indicating that just the
SHAPE field will be updated using the insert cursor.

# Reads a GPS-produced text file and writes the lat and long values
# to an already-created polyline shapefile. Handles multiple polylines.

# Function to add a polyline


def addPolyline(cursor, array, sr):
polyline = arcpy.Polyline(array, sr)
cursor.insertRow((polyline,))
array.removeAll()

# Main script body


import arcpy

# Hard-coded variables for GPS track text file and feature class
gpsTrack = open("D://Data//GPS//gps_track_multiple.txt", "r")
polylineFC = "D://Data//GPS//tracklines_sept25.shp"
spatialRef = arcpy.SpatialReference("WGS 1984")

# Figure out position of lat and long in the header


headerLine = gpsTrack.readline()
valueList = headerLine.split(",")

latValueIndex = valueList.index("lat")
lonValueIndex = valueList.index("long")
newTrackIndex = valueList.index("new_seg")

# Read lines in the file and append to coordinate list


with arcpy.da.InsertCursor(polylineFC, ("SHAPE@",)) as cursor:
vertexArray = arcpy.Array()

# Read each line and split it


for line in gpsTrack.readlines():
segmentedLine = line.split(",")
isNew = segmentedLine[newTrackIndex].upper()

# If starting a new line, write the completed


# line to the feature class
if isNew == "TRUE":

# This check is needed to handle the first GPS entry


if vertexArray.count > 0:
addPolyline(cursor, vertexArray, spatialRef)
# Get the lat/lon values of the current GPS reading
latValue = segmentedLine[latValueIndex]
lonValue = segmentedLine[lonValueIndex]

vertex = arcpy.Point(lonValue, latValue)


vertexArray.add(vertex)

# Add the final polyline to the shapefile


addPolyline(cursor, vertexArray, spatialRef)

Summary

You can write geometries to ArcGIS feature classes using a combination of


geometry objects included with ArcGIS. The common workflow is to create Point
objects, which you add to an Array object. You can use the Array object plus a
spatial reference to create Polyline and Polygon objects. You then use an insert
cursor to assign the geometry in the array to the feature class's geometry field
(usually called "shape").

You may be wondering how you might create a multi-part feature (such as the state
of Hawaii containing multiple islands), or a polygon with a "hole" in it. There are
special rules for ordering and nesting Points and Arrays to create these types of
geometries. These are covered in the course textbook, which brings us to...

Readings

Read Zandbergen 8.1 - 8.6, which contains a good summary of how to read and
write Esri geometries.

4.4 Automation with batch files and


scheduled tasks
In this course, we've talked about the benefits of automating your work through
Python scripts. It's nice to be able to run several geoprocessing tools in a row
without manually traversing the Esri toolboxes, but what's so automatic about
launching PythonWin, opening your script, and clicking the Run button? In this
section of the lesson, we'll take automation one step further by discussing how you
can make your scripts run automatically.

Scripts and your operating system

Most of the time we've run scripts in this course, it's been through PythonWin.
Your operating system (Windows) can run scripts directly. Maybe you've tried to
double-click a .py file to run a script. As long as Windows understands that .py files
represent a Python script and that it should use the Python interpreter to run the
script, the script will launch immediately.
When you try to launch a script automatically by double-clicking it, it's possible
you'll get a message saying Windows doesn't know which program to use to open
your file. If this happens to you, use the Browse button on the error dialog box to
browse to the Python executable, most likely located in
C:\Python26\ArcGIS10.0\Python.exe. Make sure "Always use the selected
program to open this kind of file" is checked and click OK. Windows now
understands that .py files should be run using Python.

Double-clicking a .py file gives your operating system the simple command to run
that Python script. You can alternatively tell your operating system to run a script
using the Windows command line interface. This environment just gives you a
blank window with a blinking cursor and allows you to type the path to a script or
program, followed by a list of parameters. It's a clean, minimalist way to run a
script. In Windows XP, you can open the command line by clicking Start > Run
and typing cmd. In Windows Vista or Windows 7, just type cmd in the Search box.

The command line

Advanced use of the command line is outside the scope of this course. For now, it's
sufficient to say that you can run a script from the command line by typing the path
of the Python executable, followed by the full path to the script, like this:

C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py

If the script takes parameters, you must also type each argument separated by a
space. Remember that arguments are the values you supply for the script's
parameters. Here's an example of a command that runs a script with two
arguments, both strings that represent pathnames. Notice that you should use the
regular \ in your paths when providing arguments from the command line (not / or
\\ as you would use in PythonWin).

C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py
C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp

If the script executes successfully, you often won't see anything except a new
command prompt (remember, this is minimalist!). If your script is designed to
print a message, you should see the message. If your script is designed to modify
files or data, you can check those files or data (perhaps using ArcCatalog) to make
sure the script ran correctly.

You'll also see messages if your script fails. Sometimes these are the same messages
you would see in the Python Interactive Window. At other times, the messages are
more helpful than what you would see in PythonWin, making the command line
another useful tool for debugging. Unfortunately, at some times the messages are
less helpful.

Batch files
Why is the command line so important in a discussion about automation? After all,
it still takes work to open the command line and type the commands. The beautiful
thing about commands is that they, too, can be scripted. You can list multiple
commands in a simple text-based file, called a batch file. Running the batch file
runs all the commands in it.

Here's an example of a simple batch file that runs the two scripts above. To make
this batch file, you could put the text below inside an empty Notepad file and save it
with a .bat extension. Remember that this is not Python; it's command syntax:

@ECHO OFF
REM Runs both my project scripts

C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson1\Project1.py
ECHO Ran project 1
C:\Python26\ArcGIS10.0\Python.exe C:\WCGIS\Geog485\Lesson2\Project2.py
C:\WCGIS\Geog485\Lesson2\ C:\WCGIS\Geog485\Lesson2\CityBoundaries.shp
ECHO Ran project 2
PAUSE

Here are some notes about the above batch file, starting from the top:

 @ECHO OFF prevents all the lines in your batch file from being printed to

the command line window, or console, when you run the file. It's standard

procedure to use this as the first line of your batch file, unless you really

want to see which line of the file is executing (perhaps for debugging

purposes).

 REM is how you put a comment in your batch file, the same way # denotes a

comment in Python.

 You put commands in your batch file using the same syntax you used from

the command line.

 ECHO prints something to the console. This can be useful for debugging,

especially when you've used @ECHO OFF because you don't want to see

every line of your batch file printed to the console.

 PAUSE gives a "Press any key to continue..." prompt. If you don't put this at

the end of your batch file, the console will immediately close after the file is
done executing. When you're writing and debugging the batch file, it's useful

to put PAUSE at the end so you can see any error messages that were

printed when running the file. Once your batch file is tested and working

correctly, you can remove the PAUSE.

Batch files can contain variables, loops, comments, and conditional logic, all of
which are beyond the scope of this lesson. However, if you'll be writing and running
many scripts for your organization, it's worthwhile to spend some time learning
more about batch files. Fortunately, batch files have been around for a long time
(they are older than Windows itself), so there's an abundance of good information
available on the Internet to help you.

Scheduling tasks

At this point we've come pretty close to reaching true automation, but there's still
that need to launch the Python script or the batch file, either by double-clicking it,
invoking it from the command line, or otherwise telling the operating system to run
it. To truly automate the running of scripts and batch files, you can use an
operating system utility such as Windows Task Scheduler.

Task Scheduler is one of those items hidden in Windows System Tools that you
may not have paid any attention to before. It's a relatively simple program that
allows you to schedule your scripts and batch files to run on a regular basis. This is
helpful if the task needs to run often enough that it would be burdensome to launch
the batch file manually; but it's even more helpful if the task takes some of your
computing resources and you want to run it during the night or weekend to
minimize impact on others who may be using the computer.

Here's a real-world scenario where Task Scheduler (or a comparable utility if you're
running on a Mac, Linux, or UNIX) is very important: Fast Web maps tend to use a
server-side cache of pregenerated map images, or tiles, so that the server doesn't
have to draw the map each time someone navigates to an area. A Web map
administrator who has ArcGIS Server can run the tool Manage Map Server Cache
Tiles to make the tiles before he or she deploys the Web map. After deployment, the
server quickly sends the appropriate tiles to people as they navigate the Web map.
So far so good.

As the source GIS data for the map changes, however, the cache tiles become out of
date. They are just images and do not know how to update themselves
automatically. The cache needs to be updated periodically, but cache tile creation is
a time consuming and CPU-intensive operation. For this reason, many server
administrators use Task Scheduler to update the cache. This usually involves
writing a script or batch file that runs Manage Map Server Cache Tiles and other
caching tools, then scheduling that script to run on nights or weekends when it
would be least disruptive to users of the Web map.

Inside Windows Task Scheduler

Let's take a quick look inside Windows Task Scheduler. The instructions below are
for Windows Vista (and probably Windows 7). Other versions of Windows have a
very similar Task Scheduler, and with some adaptation you can also use the
instructions below to understand how to schedule a task.

1. Open Task Scheduler by navigating the Windows Start menu to All

Programs > Accessories > System Tools > Task Scheduler.

2. Click Create Basic Task. This walks you through a simple wizard to set up

the task. You can configure advanced options on the task later.

3. Give your task a Name that will be easily remembered and optionally, a

Description. Then click Next.

4. Choose how often you want the task to run. For this example, choose Daily.

Then click Next.

5. Choose a Start time and a recurrence frequency. If you want, choose a time

a few minutes ahead of the current time, so you can see what it looks like

when a task runs. Then click Next.

6. Choose Start a program, then click Next.

7. Here's the moment of truth where you specify which script or batch file you

want to run. Click Browse and navigate to one of the Python scripts you've

written during this course. It's going to be easiest here if you pick a script

that doesn't take any arguments, such as your project 1 script that makes

contour lines from hard-coded datasets, but if you are feeling brave you can

also add arguments in this panel of the wizard. Then click Next.

8. Review the information about your task, then click Finish.


9. Notice that your task now appears in the list in Task Scheduler. You can

highlight the task to see its properties, or right-click the task and click

Properties to actually set those properties. You can use the advanced

properties to get your script to run even more frequently than daily, for

example, every 15 minutes.

10. Wait for your scheduled time to occur, or if you don't want to wait, right-

click the task and click Run. Either way, you'll see a console window appear

when the script begins and disappear once the script has finished. (If you're

running a Python script and you don't want the console window to disappear

at the end, you can put a line at the end of the script such as lastline =

raw_input(">"). This stops the script until the user presses Enter. Once

you're comfortable with the script running on a regular basis, you'll probably

want to remove this line to keep open console windows from cluttering your

screen. After all, the idea of a scheduled task is that it happens in the

background without bothering you.)


Figure 4.1 The Windows Task Scheduler.

Summary

To make your scripts run automatically, you use Windows Task Scheduler to create
a task that the operating system runs at regular intervals. The task can point at
either a .py file (for a single script), or a .bat file (for multiple scripts). Using
scheduled tasks, you can achieve full automation of your GIS processes.

4.5 Running any tool in the box


Sooner or later, you're going to have to include a geoprocessing tool in your script
that you have never run before. It's possible that you've never even heard of the tool
or run it from its GUI, let alone a script.
In other cases, you may know the tool very well, but your Python may be rusty, or
you may not be sure how to construct all the necessary parameters.

The approach for both of these situations is the same. Here are some suggested
steps for running any tool in the ArcGIS toolboxes using Python:

1. Find the tool reference documentation. We've seen this already during the

course. Each tool has its own topic in the Geoprocessing tool reference [11]

section of the ArcGIS Help. Open that topic and read it before you do

anything else. Read the "Usage" section at the beginning to make sure that

it's the right tool for you and that you are about to employ it correctly.

2. Examine the parameters. Scroll down to the "Syntax" section of the topic

and read which parameters the tool accepts. Note which parameters are

required and which are optional, and decide which parameters your script is

going to supply.

3. In your Python script, create variables for each parameter. Note that each
parameter in the "Syntax" section of the topic has a data type listed. If the
data type for a certain parameter is listed as "String," you need to create a
Python string variable for that parameter.

Sometimes the translation from data type to Python variable is not direct.
For example, sometimes the tool reference will say that the required variable
is a "Feature Class." What this really means for your Python script is that
you need to create a string variable containing the path to a feature class.

Another example is if the tool reference says that the required data type is a
"Long." What this means in Python is that you need to create a numerical
variable (as opposed to a string) for that particular parameter.

If you have doubts about how to create your variable to match the required
data type, scroll down to the "Code Sample" in the tool reference topic. Try
to find the place where the example script defines the variable you're having
trouble with. Copy the patterns that you see in the example script and
usually you'll be okay.

Most of the commonly used tools have excellent example scripts, but others
are hit or miss. If your tool of interest doesn't have a good example script,
you may be able to find something on the Esri forums, ArcScripts [12], or a
well-phrased Google search.

4. Run the tool...with error handling. You can run your script without

try/except blocks to catch any basic errors in the Interactive Window. If

you're still not getting anything helpful, a next resport is to add the

try/except blocks and put print arcpy.GetMessages() in the except

block.

In Project 4 you'll get a chance to practice these skills to run a tool you previously
haven't worked with in a script.

4.6 Working with map documents


To this point, we've talked about automating geoprocessing tools, updating GIS
data, and reading text files. However, we've not covered anything about working
with an Esri map document. There are many tasks that can be performed on a map
document that are well-suited for automation. These include:

 Finding and replacing text in a map or series of maps. For example, a

copyright notice for 2010 becomes 2011.

 Repairing layers that are referencing data sources using the wrong paths.

For example, your map was sitting on a computer where all the data was in

C:\data and now it is on a computer where all the data is in

D:\myfolder\mydata.

 Printing a series of maps or data frames.

 Exporting a series of maps to PDF and joining them to create a "map book."

 Making a series of maps available to others on ArcGIS Server.

Esri map documents are binary files, meaning they can't be easily read and parsed
using the techniques we covered earlier in this lesson. Until very recently the only
way to automate anything with a map document was to use ArcObjects, which is
somewhat challenging for beginners and requires using a language other than
Python. With the release of ArcGIS 10.0, Esri added a Python module for
automating common tasks with map documents.

The arcpy.mapping module

arcpy.mapping is a module you can use in your scripts to work with map
documents. Please take a detour at this point to read the Esri overview of
arcpy.mapping, which is found in the topic Geoprocessing scripts for map
document management and output [13].

The most important object in this module is MapDocument. This tells your script
which map you'll be working with. You can get a MapDocument by referencing a
path, like this:

mxd = arcpy.mapping.MapDocument(r"C:\data\Alabama\UtilityNetwork.mxd")

Notice the use of r in the line above to denote a string literal. In other words, if you
include r right before you begin you're string, it's safe to use reserved characters
like the single backslash \. I've done it here because you'll see it in a lot of the Esri
examples with arcpy.mapping.

Instead of directly using a string path, you could alternatively put a variable
holding the path. This would be useful if you were iterating through all the map
documents in a folder using a loop, or if you previously obtained the path in your
script using something like arcpy.GetParameterAsText().

It can be convenient to work with arcpy.mapping in the Python window in ArcMap.


In this case, you do not have to put the path to the MXD. There's a special keyword
"CURRENT" that you can use to get a reference to the currently-open MXD.

mxd = arcpy.mapping.MapDocument("CURRENT")

Once you get a MapDocument, then you do something with it. Most of the
functions in arcpy.mapping take a MapDocument object as a parameter. Let's look
at this first script from the Esri help topic linked above and scrutinize what is going
on. I've added comments to each line.

# Create a MapDocument object referencing the MXD you want to update


mxd = arcpy.mapping.MapDocument(r"C:\GIS\TownCenter_2009.mxd")

# Loop through each text element in the map document


for textElement in arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT"):

# Check if the text element contains the out of date text


if textElement.text == "GIS Services Division 2009":

# If out of date text is found, replace it with the new text


textElement.text = "GIS Services Division 2010"
# Export the updated map to a PDF
arcpy.mapping.ExportToPDF(mxd, r"C:\GIS\TownCenterUpdate_2010.pdf")

# Clean up the MapDocument object by deleting it


del mxd

The first line in the above example gets a MapDocument object referencing
C:\GIS\TownCenter_2009.mxd. The example then employs two functions from
arcpy.mapping. The first is ListLayoutElements. Notice that the parameters for this
function are a MapDocument and the type of layout element you want to get back,
in this case, "TEXT_ELEMENT". (Examine the documentation for List Layout
Elements [14] to understand the other types of elements you can get back.)

The function returns a Python list of TextElement [15] objects representing all the
text elements in the map document. You know what to do if you want to
manipulate every item in a Python list. In this case, the example uses a for loop to
check the TextElement.text property of each element. This property is readable and
writeable, meaning if you want to set some new text, you can do so by simply using
the equals sign assignment operator as in textElement.text = "GIS Services Division
2010"

The ExportToPDF function is very simple in this script. It takes a MapDocument


and the path of the output PDF as parameters. If you look at the documentation for
ExportToPDF [16], you'll notice a lot of other optional parameters for exporting
PDFs, such as whether to embed fonts, that are just left as defaults in this example.

Learning arcpy.mapping

The best way to learn arcpy.mapping is to try to use it. Because of its simple, "one-
line-fix" nature, it's a good place to practice your Python. It's also a good way to get
used to the Python window in ArcMap, because you can immediately see the results
of your actions.

Although there is no arcpy.mapping component to this lesson's project, you're


welcome to use it in your final project. If you've already submitted your final
project proposal, you can amend it to use arcpy.mapping by e-mailing and
obtaining approval from the instructors. If you use arcpy.mapping in your final
project, you should attempt to incorporate several of the functions or mix it with
other Python functionality you've learned, making something more complex than
the "one line fix" type of script I mentioned above.

By now you'll probably have experienced the reality that your code does not always
run as expected on the first try. Before you start running arcpy.mapping commands
on your production MXDs, I suggest making backup copies.

Here are a few additional places where you can find excellent help on learning
arcpy.mapping:
 Zandbergen chapter 10. I recommend that you at least skim this chapter to

see the types of examples that are included.

 The Arcpy Mapping module book [17] in the ArcGIS Desktop Help

 Video from the 2010 Esri Developer Summit: Python scripting for map

automation in ArcGIS 10 [18]

 GeoChalkboard blog post: Introducing the ArcPy.Mapping Module In

ArcGIS 10 [19]

4.7 Limitations of Python scripting


with ArcGIS
In this course you've learned the basics of programming and have seen how Python
can automate any GIS function that can be performed with the ArcGIS toolboxes.
There's a lot of power available to you through scripting, and hopefully you're
starting to get ideas about how you can apply that in your work outside this course.

To conclude this lesson, however, it's important to talk about what's not available
through Python scripting in ArcGIS.

Limits with fine-grained access to the "guts" of ArcGIS

At ArcGIS, Python interaction with ArcGIS is mainly limited to reading and writing
data, editing the properties of map documents, and running the tools that are
included with ArcGIS. Although the ArcGIS tools are useful, they are somewhat
black box, meaning you put things in and get things out without knowing or being
concerned about what is happening inside. If you want a greater degree of control
over how ArcGIS is manipulating your data, you need to work with ArcObjects.

ArcObjects can be thought of as "the building blocks" of ArcGIS. In fact, an analogy


with the children's Lego building bricks works well to describe ArcObjects:
Programming with ArcObjects is akin to having an enormous selection of Legos of
all shapes and sizes, whereas Python scripting is like working with a kit containing
some large prefabricated pieces that make it much easier to construct a particular
final product.

Because of the sheer amount of functionality and objects available to you,


ArcObjects is more challenging to learn than simple Python scripting. Usually, an
equivalent task takes many more lines of code to write in ArcObjects than in a
Python script. However, when you use ArcObjects you have much greater control
over what happens in your program. You can take a small piece of functionality and
use it without the overhead of a tool or all the other parameters that come with a
tool.

Limits with user interface customization at ArcGIS 10.0

In this course we have done nothing with customizing ArcMap to add special
buttons, toolbars, and so on that trigger our programs. Our foray into user interface
design has been limited to making a script tool and toolbox. Although script tools
are useful, there are times when you want to take the functionality out of the
toolbox and put it directly into ArcMap as a button on a toolbar. You may want that
button to launch a new window with text boxes, labels, and buttons that you design
yourself.

In ArcGIS 10.0 if you want to put custom functionality or programs directly into
ArcMap, you need to use Visual Basic for Applications (VBA), C ++, or a .NET
language (VB.NET or C#) working with ArcObjects. The functionality may be as
simple as putting some custom actions behind a button (zoom to a certain
bookmark, for example), or you may open a full-blown program you develop with
multiple forms, options, and menus. The aforementioned languages have IDEs in
which you can design custom user interfaces with text boxes, labels, buttons, and so
on.

Geog 489, another elective course in the GIS certificate program, covers GIS
customization using ArcObjects.

New Python add-in functionality at ArcGIS 10.1

To allow a greater degree of interactivity between the ArcMap user interface and
Python scripts, ArcGIS 10.1 introduces the concept of a Python add-in. These allow
you to attach Python logic to a limited set of actions you perform in ArcMap, such
as zooming the map, opening a new map document, or clicking a button on a
custom toolbar. For example, you might create an add-in that automatically adds a
particular set of layers any time someone pushes a certain button on your toolbar.

With Python add-ins, you get access to a number of user interface elements to use
as a front end to your Python scripts, including toolbars, buttons, menus, combo
boxes, and basic file browsing and Yes/No confirmation dialog boxes. There's also a
set of common events that you can detect and respond to in your code, such as the
map opening, the map extent changing, or the spatial reference changing. Although
this is far from the full realm of ArcObjects and .NET customization possibilities, it
gives a lot more possibilities than were available in previous versions of ArcGIS.

The nice thing about add-ins is that they are easily shareable. You download the
Python Add-In Wizard from Esri, and it helps you prepare and package up your
add-in into a .esriaddin file. Other people with ArcGIS can then install the add-in
from the .esriaddin file.

Working with Python add-ins is currently not included in the scope of this course,
but you can learn all about them in the help book ArcGIS Desktop Python add-ins
[20]. After reading this material and getting a basic understanding of what's
required to create add-ins, you're welcome to incorporate them into your final
project if you have ArcGIS 10.1 and you are confident that you can work somewhat
independently to test and create the add-ins. If you have struggled in the course, I
recommend that you wait until after completing Geog 485 to further explore add-
ins, so that you can give them the necessary amount of time and testing.

Lesson 4 Practice Exercises


Introduction
These practice exercises will give you some more experience applying the Lesson 4
concepts. They are designed to prepare you for some of the techniques you'll need
to use in your Project 4 script.

Download the data for the practice exercises [21]

Both exercises involve opening a file and parsing text. In Practice Exercise A, you'll
read some coordinate points and make a polygon from those points. In Practice
Exercise B, you'll work with dictionaries to manage information that you parse
from the text file.

Example solutions are provided for both practice exercises. You'll get the most
value out of the exercises if you make your best attempt to complete them on your
own before looking at the solutions. In any case, the patterns shown in the solution
code can help you approach Project 4.

Lesson 4 Practice Exercise A


This practice exercise is designed to give you some experience writing geometries
to a shapefile. You have been provided two things:

 A text file MysteryStatePoints.txt containing the coordinates of a state

boundary.

 An empty polygon shapefile that uses a geographic coordinate system.

The objective
Your job is to write a script that reads the text file and creates a state boundary
polygon out of the coordinates. When you successfully complete this exercise, you
should be able to preview the shapefile in ArcCatalog and see the state boundary.

Tips

If you're up for the challenge of this script, go ahead and start coding. But if you're
not sure how to get started, here are some tips:

 This script will differ from some of the examples you've seen. There is no

header line for the file, and there is only one line of text to read. This should

actually make the file easier to process.

 Another difference is that the items of interest are separated by a | character.

Remember that when you call the split() method, you can pass in any

delimiter. Previously we have used a comma (",") but you can use the | just

as easily ("|").

 Before you start looping through the coordinates, create an Array object to

hold all the points in your polygon.

 Loop through each coordinate and create a Point object from the coordinate

pair. Then add the Point object to your Array object.

 Once you start looping through the coordinates, you'll be dealing with

coordinate pairs such as -109.05,31.33. You need to split this again (this

time using a comma delimiter) in order to isolate the X and Y values.

 Once you're done looping, create an insert cursor on your shapefile. Go to

the first row and assign your Array to the SHAPE field.

Lesson 4 Practice Exercise A


Solution
Here's one way you could approach Lesson 4 Practice Exercise A. If you have a
different or more efficient solution, please share in the forums. Note that the video
is several quarters old and shows a slightly different way of creating an array, using
the CreateObject method. Also in the video, a Polygon object is not created; the
array is assigned directly to the SHAPE field. Although both techniques work, I
recommend that you continue creating your geometry objects directly from arcpy
like we have been doing in this lesson, and as shown in the code sample below.

# Reads coordinates from a text file and writes a polygon

import arcpy

shapefile = "C:\\Data\\Lesson4PracticeExerciseA\\MysteryState.shp"
pointFilePath =
"C:\\Data\\Lesson4PracticeExerciseA\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference

# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()

# Make a Python list out of the coordinates. | is the delimiter


coordinatePairList = lineOfText.split("|")

# This Array object will hold a clockwise "ring" of Point


# objects, thereby making a polygon.
polygonArray = arcpy.Array()

# Loop through each coordinate pair and make a Point object


for coordinatePair in coordinatePairList:
# Split the coordinate pair by comma. This gives you
# a list with two items: the X and Y coordinates
coordinates = coordinatePair.split(",")

# Create a point, assigning the X and Y values from your list


currentPoint = arcpy.Point(coordinates[0],coordinates[1])

# Add the newly-created Point to your Array


polygonArray.add(currentPoint)

# Create an insert cursor and apply your array to a polygon object


cursor = arcpy.InsertCursor(shapefile)
row = cursor.newRow()
polygon = arcpy.Polygon(polygonArray, spatialRef)

# Insert the polygon as a row


row.SHAPE = polygon
cursor.insertRow(row)

# Release locks by deleting cursors


del row
del cursor

Alternate solution using the arcpy data access module in ArcGIS 10.1
Here's an example of how you might solve this practice exercise using the arcpy
data access module in ArcGIS 10.1. In this case the "SHAPE@" token is used to
assign the geometry to a row. The syntax ("SHAPE@",) is a tuple with one item
indicating that just the SHAPE field will be updated.

# Reads coordinates from a text file and writes a polygon

import arcpy

shapefile =
"D:\\data\\Geog485\\Lesson4PracticeExerciseA\\MysteryState.shp"
pointFilePath =
"D:\\data\\Geog485\\Lesson4PracticeExerciseA\\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference

# Open the file and read the first (and only) line
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()

# Make a Python list out of the coordinates. | is the delimiter


coordinatePairList = lineOfText.split("|")

# This Array object will hold a clockwise "ring" of Point


# objects, thereby making a polygon.
polygonArray = arcpy.Array()

# Loop through each coordinate pair and make a Point object


for coordinatePair in coordinatePairList:
# Split the coordinate pair by comma. This gives you
# a list with two items: the X and Y coordinates
coordinates = coordinatePair.split(",")

# Create a point, assigning the X and Y values from your list


currentPoint = arcpy.Point(coordinates[0],coordinates[1])

# Add the newly-created Point to your Array


polygonArray.add(currentPoint)

# Create a Polygon from your Array


polygon = arcpy.Polygon(polygonArray, spatialRef)

# Create an insert cursor and apply the Polygon to a new row


with arcpy.da.InsertCursor(shapefile, ("SHAPE@",)) as cursor:
cursor.insertRow((polygon,))

Lesson 4 Practice Exercise B


This practice exercise does not do any geoprocessing or GIS, but it will help you get
some experience working with functions and dictionaries. The latter will be
especially helpful as you work on Project 4.

The objective
You've been given a text file of (completely fabricated) soccer scores from some of
the most popular teams in Buenos Aires. Write a script that reads through the
scores and prints each team name, followed by the maximum number of goals that
team scored in a game, for example:

River: 5
Racing: 4
etc.

Keep in mind that the maximum number of goals scored might have come during a
loss.

You are encouraged to use dictionaries to complete this exercise. This is probably
the most efficient way to solve the problem. You'll also be able to write at least one
function that will cut down on repeated code.

I have purposefully kept this text file short to make things simple to debug. This is
an excellent exercise in using the debugger, especially to watch your dictionary as
you step through each line of code.

Tips

If you want a challenge, go ahead and start coding. Otherwise, here are some tips
that can help you get started:

 Your approach should be to read through each line and split it using a space

delimiter (" ").

 Create variables for all your items of interest, including winner,

winnerGoals, loser, and loserGoals and assign them appropriate values

based on what you parsed out of the line of text.

 Review chapter 6.8 on dictionaries in the Zandbergen text. You want to

make a dictionary that has a key for each team, and an associated value that

represents the team's maximum number of goals. If you looked at the

dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}

 You can write a function that takes in three things: the key (team name), the

number of goals, and the dictionary name. This function should then check
if the key has an entry in the dictionary. If not, a key should be added and its

value set to the current number of goals. If a key is found, you should

perform a check to see if the current number of goals is higher than the

value associated with that key. If so, you should set a new value. Notice how

many "ifs" appear in the preceding sentences.

 Some of the lines of text end with the new line character "\n". This can

happen with some text files that come out of Notepad. You can get rid of this

with the rstrip() method: line = line.rstrip("\n").

Lesson 4 Practice Exercise B


Solution
This practice exercise is a little trickier than previous exercises. If you were not able
to code a solution, study the following solution carefully and make sure you know
the purpose of each line of code.

# Reads through a text file of soccer (football)


# scores and reports the highest number of goals
# in one game for each team

# ***** DEFINE FUNCTIONS *****

# This function checks if the number of goals scored


# is higher than the team's previous max.
def checkGoals(team, goals, dictionary):
#Check if the team has a key in the dictionary
if team in dictionary:
# If a key was found, check goals against team's current max
if goals > dictionary[team]:
dictionary[team] = goals
else:
pass
# If no key found, add one with current number of goals
else:
dictionary[team] = goals

# ***** BEGIN SCRIPT BODY *****

# Open the text file of scores


scoresFilePath = "C:\\Data\\Lesson4PracticeExerciseB\\Scores.txt"
scoresFile = open(scoresFilePath)
# Read the header line and get the important field indices
headerLine = scoresFile.readline()
headerLine = headerLine.rstrip("\n") #Remove "new line" character

segmentedHeaderLine = headerLine.split(" ")


winnerIndex = segmentedHeaderLine.index("Winner")
winnerGoalsIndex = segmentedHeaderLine.index("WG")
loserIndex = segmentedHeaderLine.index("Loser")
loserGoalsIndex = segmentedHeaderLine.index("LG")

# Create an empty dictionary. Each key will be a team name.


# Each value will be the maximum number of goals for that team.
maxGoalsDictionary = {}

# Loop through each line of the file


for line in scoresFile.readlines():
line = line.rstrip("\n") # Remove "new line" character
segmentedLine = line.split(" ")

# Create variables for all items of interest in the line of text


winner = segmentedLine[winnerIndex]
winnerGoals = segmentedLine[winnerGoalsIndex]
loser = segmentedLine[loserIndex]
loserGoals = segmentedLine[loserGoalsIndex]

# Check the winning number of goals against the team's max


checkGoals(winner, winnerGoals, maxGoalsDictionary)

# Also check the losing number of goals against the team's max
checkGoals(loser, loserGoals, maxGoalsDictionary)

# Print the results


for key in maxGoalsDictionary:
print key + ": " + maxGoalsDictionary[key]

Lesson 4 Practice Exercise C


This exercise gives you some more practice using dictionaries. This time, you’ll be
reading some values from a pre-built dictionary Each key in the dictionary is a type
of animal. The corresponding value for each key is a Python list containing
different types of that animal. Your task is to print the average length of the name
of each animal type. For example, if you were looking at the key “Birds”, there are
three types of birds in the list (“Robin”, “Canary”, and “Bluebird”) and the average
length of those strings is 6.33 (in other words, (5 + 6 + 8)/3).

You’ll start with the code below that builds the dictionary. Copy and paste this into
an empty script and start writing your code below it. Your dictionary name will be
animals:

# function to load dictionarydef BuildDictionary():

#create lists
dogList = ["Dalmatian", "German Shepherd"]
catList = ["American Shorthair"]
birdList = ["Robin", "Canary","Bluebird" ]

#use dict() constructor to create dictionary and add keys and values
return dict([('dogs', dogList), ('cats', catList), ('birds',
birdList)])

# Call the function and assign the result to the variable 'animals'.
animals = BuildDictionary()
# New code to print the average length of the animal names for each
animal type
# (dogs, cats, and birds) should be inserted after this line.

Tips

If you're up for the challenge of this script, go ahead and start coding. But if you're
not sure how to get started, here are some tips:

o You can retrieve values by calling MyDictionary[key] where

MyDictionary is a dictionary and key is a valid key.

o You can retrieve the set of keys by calling MyDictionary.keys() and

retrieve the associated values by calling MyDictionary.values().

o You can find the length of a string by using the function len, for

example, len(MyString)

o There are many ways to solve this problem, the answer gives two.

Lesson 4 Practice Exercise C


Solution
This point of this practice exercise was to help you understand how to handle using
nested data structures, like lists, as values in dictionaries. This may seem confusing
if you are not used to nested data structures, however, they are often the most
straightforward way of handling complex data. You many need to use a similar
technique to solve the lesson 4 assignment.

Two ways of solving the problem are shown below, both accomplish the same
thing.

# Name: dictionaries.py
# Description: Solves sample problem using dictionaries.
# Author: Frank Hardisty
# function to load dictionary
def BuildDictionary():

#create lists
dogList = ["Dalmatian", "German Shepherd"]
catList = ["American Shorthair"]
birdList = ["Robin", "Canary","Bluebird" ]

#use dict() constructor to create dictionary and add keys and values
return dict([('dogs', dogList), ('cats', catList), ('birds',
birdList)])

# call the function and assign the result to the variable 'animals'
animals = BuildDictionary()

#find average length of names for different animal types two different
ways

#define a floating point variable to hold totals


total = 0.0

# first approach: using the known keys


dList = animals['dogs']

for item in dList:


total = total + len(item)

total = total / len(dList)

print 'dogs: ' + str(total)

total = 0.0
cList = animals['cats']

for item in cList:


total = total + len(item)

total = total / len(cList)

print 'cats: ' + str(total)

total = 0.0
bList = animals['birds']

for item in bList:


total = total + len(item)

total = total / len(bList)

print 'birds: ' + str(total)

# second approach: iterating over lists


for key in animals.keys():
total = 0.0
animalList = animals[key]
for item in animalList:
total = total + len(item)
total = total / len(animalList)
print key + ": " + str(total)

Project 4: Parsing rhinoceros


sightings
In this project, you're working for a wildlife conservation group that is tracking
rhinos in the African savannah. Your field workers' software resources and GIS
expertise are limited, but you have managed to obtain an Excel spreadsheet
showing the positions of several rhinos over time [22]. Each record in the
spreadsheet shows the latitude/longitude coordinate of a rhino along with the
rhino's name (these rhinos are well known to your field workers).

You want to write a script that will turn the readings in the spreadsheet into a
vector dataset that you can place on a map. This will be a polyline dataset showing
the tracks the rhinos followed over the time the data was collected.

Please carefully read all the following instructions before beginning the project.

Deliverables

This project has the following deliverables:

1. Your plan of attack for this programming problem, written in pseudocode in

any text editor. This should consist only of short, focused steps describing

what you are going to do to solve the problem. This is a separate deliverable

from your customary project writeup.

2. A Python script that reads the data from the spreadsheet and creates, from

scratch, a polyline shapefile with n polylines, n being the number of rhinos

in the spreadsheet. Each polyline should represent a rhino's track

chronologically from the beginning of the spreadsheet to the end of the

spreadsheet. Each polyline should also have a text attribute containing the
rhino's name. The shapefile should use the WGS 1984 geographic coordinate

system.

3. A short writeup (~300 words) explaining what you learned during this

project and which requirements you met, or failed to meet. Also describe any

"over and above" efforts here so that the graders can look for them.

Successful delivery of the above requirements is sufficient to earn 90% on the


project. The remaining 10% is reserved for efforts that go "over and above" the
minimum requirements. This could include (but is not limited to) useful code
comments, an insightful writeup explaining some lesson learned during the coding,
a batch file that could be used to automate the script, creation of the feature class in
a file geodatabase instead of a shapefile, or the breaking out of repetitive code into
functions and/or modules.

Challenges

You may already see several immediate challenges in this task:

 The data is in a format (XLSX) that you cannot easily parse. The first step
you must do is manually open the file in Excel and save it as a comma-
delimited format that you can easily read with a script. Choose the option
CSV (comma-delimited) (*.csv).

If you are so inclined, you can attempt to download and use a Python library
that works directly with XLSX files. Be aware that you will have less
comprehensive "technical support" from your fellow students if you use this
route.

 The rhinos in the spreadsheet appear in no guaranteed order, and not all the

rhinos appear at the beginning of the spreadsheet. As you parse each line,

you must determine which rhino the reading belongs to and update that

rhino's polyline track accordingly. You are not allowed to sort the

Rhino column in Excel before you export to the CSV file. Your

script must be "smart" enough to work with an unsorted

spreadsheet in the order that the records appear.


 You do not immediately know how many rhinos are in the file or even what

their names are. Although you could visually comb the spreadsheet for this

information and hard-code each rhino's name, your script is required to

handle all the rhino names programmatically. The idea is that you should be

able to run this script on a different file, possibly containing more rhinos,

without having to make many manual adjustments.

 You have not previously created a feature class programmatically. You must

find and run ArcGIS geoprocessing tools that will create an empty polyline

shapefile with a text field for storing the rhino's name. You must also assign

the WGS 1984 geographic coordinate system as the spatial reference for this

shapefile.

Hints

 Before you start writing code, write a plan of attack describing the logic your
script will use to accomplish this task. Break up the original task into small,
focused chunks. You can write this in Word or even Notepad. Your objective
is not to write fancy prose, but rather short, terse statements of what your
code will do: in other words, pseudocode. Here's an example of some
pseudocode that might appear in your file:

...

Read the next line.

Split the line.

Determine the rhino referenced in this line.

Determine if the dictionary has a key for the rhino.

If no key exists, create a new array object.

Create a new point object.

Assign the X reading to the X coordinate of the point.


Assign the Y reading to the Y coordinate of the point.

Add the point to the array.

Add the array to the dictionary using the rhino name as the key.

...

If you do a good job writing your pseudocode, you'll find that each line
translates into about one line of code. Writing your script then becomes a
matter of translating from English to code. You may also find it helpful to
sketch out a diagram of the workflow and logistical branches in your script.

 You will have a much easier time with this assignment if you first create the
array objects representing each rhino track, then use insert cursors to add
the arrays once they are completed. Not only is this easier to code, it's better
for performance to open the insert cursor only once near the end of the
script.
 A Python dictionary is an excellent structure for storing a rhino name
coupled with the rhino's array of observed locations. A dictionary is similar
to a list, but it stores items in key-value pairs. For example, a key could be a
string representing the rhino name, and that key's corresponding value
could be an array object containing all the points where the rhino was
observed. You can retrieve any value based on its key, and you can also
check whether a key exists using a simple if key in dictionary:
check.

We have not worked with dictionaries much in this course, but your
Zandbergen text has an excellent section about them and there are abundant
Python dictionary examples on the Internet.

You can alternatively use lists to keep track of the information, but this will
probably take more code. Using dictionaries I was able to write this script in
under 60 lines (including comments and whitespace). If you find yourself
getting confused or writing a lot of code with lists, you may try to switch to
dictionaries.

 To create your shapefile programmatically, use the CreateFeatureClass tool.


The ArcGIS Desktop Help has several examples of how to use this tool. If
you can't figure this part out, I suggest you create the feature class manually
and work on writing the rest of the script. You can then return to this part at
the end if you have time.
 In order to get the shapefile in WGS 1984, you'll need to create a spatial

reference object that you can assign to the shapefile at the time you create it.

I recommend using the SpatialReference.CreateFromFile() method and


pointing at the appropriate .prj file in C:\Program Files\ArcGIS\Coordinate

Systems\. Be warned that if you do not correctly apply the spatial reference,

your polyline precision could be diluted.

If you do things right, your polylines should look like this (points are included only
for reference):

Note: Although I have placed the data in an African context (who heard of rhinos
wandering New York City?) it is completely fabricated and does not resemble the
path of any actual rhino, living or dead. If you exhibit a stellar performance on this
project, you may choose the option of having a rhino named after you in a future
offering of this course!

Author(s) and/or Instructor(s): Sterling Quinn, John A. Dutton e-Education


Institute, College of Earth and Mineral Sciences, The Pennsylvania State
University;
Jim Detwiler, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University;
Frank Hardisty, John A. Dutton e-Education Institute, College of Earth and
Mineral Sciences, The Pennsylvania State University;
James O'Brien, John A. Dutton e-Education Institute, College of Earth and Mineral
Sciences, The Pennsylvania State University

Penn State Professional Masters Degree in GIS: Winner of the 2009 Sloan
Consortium award for Most Outstanding Online Program
© 1999-2012 The Pennsylvania State University. Except where otherwise noted,
this courseware module is licensed under the Creative Commons Attribution-Non-
Commercial-Share-Alike 3.0 License and is freely available through Penn State's
College of Earth and Mineral Sciences' Open Educational Resources Initiative.

Please address questions and comments about this resource to the site editor.

Source URL: https://www.e-education.psu.edu/geog485/node/139

Links:
[1] https://www.e-education.psu.edu/geog485/?q=node/149
[2] http://www.w3schools.com/XML/xml_whatis.asp
[3] http://oreilly.com/catalog/pythonxml/chapter/ch01.html
[4] https://www.e-education.psu.edu/drupal6/files/geog485py/data/gps_track.txt
[5] http://docs.python.org/library/functions.html#open
[6] http://docs.python.org/library/stdtypes.html?highlight=readline#file.readline
[7] http://docs.python.org/library/string.html?highlight=split#string.split
[8] http://docs.python.org/library/stdtypes.html?highlight=readline#file.readlines
[9] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/gps_track_multiple.txt
[10]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/000v/000v0000005r00000
0.htm
[11]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/002t/002t0000000z000000.
htm
[12] http://arcscripts.esri.com/
[13]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Geoprocessing_
scripts_for_map_document_management_and_output/00s300000032000000/
[14]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/ListLayoutEleme
nts/00s30000003w000000/
[15]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/TextElement/00
s30000000m000000/
[16]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s3/00s300000027000000
.htm
[17]
http://help.arcgis.com/en/arcgisdesktop/10.0/help/00s3/00s300000032000000
.htm
[18]
http://proceedings.esri.com/library/userconf/devsummit10/tech/tech_56.html
[19] http://geochalkboard.wordpress.com/2010/08/02/introducing-the-arcpy-
mapping-module-in-arcgis-10/
[20]
http://resources.arcgis.com/en/help/main/10.1/014p/014p00000025000000.ht
m
[21] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/Lesson4PracticeExercises.zip
[22] https://www.e-
education.psu.edu/drupal6/files/geog485py/data/RhinoObservations.xlsx

You might also like