Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

The Hang Seng University of Hong Kong

Department of Mathematics and Statistics


AMS2640 Statistical Computing in Practice

Study Notes

Lesson 1 – Introduction to Basic Data Management in Excel


✓ Microsoft Excel Shortcuts and Formats
✓ Microsoft Excel Formulas
✓ Microsoft Excel Data Management Skills
✓ Microsoft Excel Functions
✓ Import Dataset into Microsoft Excel

1
In Excel, there are many possible operations, for example, selecting a range of data, copy and
paste, filling a series, using functions for calculation etc. These operations sometimes seem
hard and are quite time-consuming; however, there are some shortcuts and hotkeys that could
improve the efficiency of using Excel.

✓ Microsoft Excel Shortcuts and Formats


• Select a range
In Excel, we usually need to select a range and then do something to it (e.g. enter a formula,
change its format, clear its contents, and so on). It is easy if the whole range appears on the
screen, but it could be a bit trickier if you cannot see the whole range.

To select a range
o Dragging with your mouse
(1) Click on one corner of the range
(2) Drag to the opposite corner

o Shift key
(1) Click on the uppermost and leftmost cell of the range
(2) Hold down the Shift key AND click on the lowermost and rightmost cell of the
range

o Ctrl + Shift key


(1) Click on the uppermost and leftmost cell of the range
(2) Hold down the Ctrl + Shift key AND press downward and right using the
keyboard arrow

Excel Example 1.1:

9 9 9
1 3 8
8 1 10
7 5 4
5 1 1
5 10 7

• Select more than one range


Sometimes, we need to select more than one range, for example, to format more than one
range as currency. The quickest way is to select all ranges at once and then format them all
at once.

To select more than one range


o Ctrl key
(1) Select the first range
(2) Press the Ctrl key AND select the second range
(3) Press the Ctrl key AND select the third range, and so on

2
Excel Example 1.2:

10 13 7 9
1 9 12 7
5 4 1 1
6 12
15 2

13 4 10

To select the ranges B2:C4 and E2:F6, click on B2, hold down the Shift key and click on C4 (so
now the first range is selected), hold down the Ctrl key and click on E2, and finally hold down
the Shift key and click on F6.

• Insert/delete row(s) or column(s)


The rows we insert are inserted above the first row we selected. For example, if we select
rows 8 through 11 and then insert, four blank rows will be inserted between the old rows 7
and 8.

To insert one or more blank rows


o Right-click menu
(1) Click on a row number and drag down as many rows as you want to insert
(2) Right click and select Insert in the menu

o Keyboard shortcut: Alt + I


(1) Click on a row number and drag down as many rows as you want to insert
(2) Press Alt + I and then R

Excel Example 1.3:


Insert blank rows for the data for Feb, Apr, and May.
Note: The Alt + I then R shortcut does not work in the embedded Excel spreadsheet below.

Month Price Units sold Revenue


Jan $3.00 100 $300.00
Mar $3.25 50 $162.50
Jun $3.50 200 $700.00

To insert one or more blank columns


Columns can be inserted with the two methods mentioned above, except that the
keyboard shortcut is Alt + I and then C. The columns we insert are inserted on the left
of the first column we selected.

3
Excel Example 1.4:
Insert blank columns for sales representatives Ben, Donald, and Eva so that the sales
representatives are in alphabetical order from left to right.
Note: The Alt + I then C shortcut does not work in the embedded Excel spreadsheet below.

Sales representative Alan Catherine Felix


Commission rate 5.4% 6.5% 4.3%
Sales $15,000 $12,000 $17,000
Commission $810 $780 $731

To delete one or more rows/columns


o Right-click menu
(1) Select the row(s)/column(s) you want to delete by clicking the row/column number
(2) Right click and select Delete in the menu

o Keyboard shortcut: Alt + E and then D


(1) Select the row(s)/column(s) you want to delete by clicking the row/column number
(2) Press Alt + E and then D

Note: Deleting a row/column is not the same as clearing the contents of a row/column.
Clear the content is to make the cell blank. Deleting a row/column is to get rid of the
row/column completely.

Excel Example 1.5:


The company no longer carries products K322 and R543, so get rid of those two rows.
Note: The Alt + E then D shortcut does not work in the embedded Excel spreadsheet below.

Product Code Units sold Unit price


J645 148 $15.00
K322 278 $17.50
L254 384 $25.00
M332 13 $30.50
R543 247 $22.40
S654 315 $35.00

• Scientific notation
Try calculating 123456789 x 123456789 in Excel, the answer will be displayed as
1.52416E+16 (or ###### if the column width is not wide enough). But what does
1.52416E+16 mean?
It means 1.52416 x 10 16 = 15,241,600,000,000,000

4
Excel Example 1.6:
Calculate 0.00123 x 0.00123 in Cell A2
Write down the answer with and without using the scientific notation.

1.52416E+16

• Decimal places
What if we want to show more/less decimal places for the answer?

Showing more decimal places


o use button “Increase Decimal”

Showing less decimal places


o use button “Decrease Decimal”

5
• Some basic keyboard shortcuts - cut, copy, paste, undo, redo
Cut, copy, paste, undo and redo are very typical operations. There is a faster way to get
these jobs done - keyboard shortcuts.

Some common keyboard shortcuts are listed below.

Function Shortcut key


Save Excel file Ctrl + S
Undo Ctrl + Z
Redo Ctrl + Y
Cut Ctrl + X
Copy Ctrl + C
Paste Ctrl + V
Go to the top left corner on the worksheet Ctrl + Home
Go to the bottom right corner on the worksheet Ctrl + End
Edit cell F2
Lock cell / row / column F4
Write a new line in a cell Alt + Enter
Select whole column Ctrl + Space
Select whole row Shift + Space
Hide a column Ctrl + 0
Hide a row Ctrl + 9
Show/Hide Formulas Ctrl + ~
Extend formula downward Ctrl + D
Extend formula to the right Ctrl + R
Insert cell Ctrl + +
Ctrl + Shift + =
Delete cell Ctrl + -
Find and Replace Shift + F5
Mouse Right Click Shift + F10
Switch between different worksheets Ctrl + PgUp
Ctrl + PgDn
Sum up a range Alt + =
Insert Current Time Ctrl + Shift + ;
Insert Current Date Ctrl + ;

6
✓ Microsoft Excel Formulas
• Copy and paste formulas
A frequent task is to enter a formula in one cell and copy it down a row or across a column.
There are several very efficient ways to do this.

To copy and paste a formula


o Keyboard shortcut: Ctrl + Enter
(1) Select all the cells where you want to enter the same formula
(2) Type the formula in the first cell
(3) Then, press Ctrl + Enter instead of just Enter

Excel Example 1.7:


A B AxB
6 2
9 1
4 9
7 6
2 3
8 5
3 9

(More examples:
http://chandoo.org/wp/2012/01/09/how-to-enter-same-data-into-multiple-cells/)

o Use the “fill handle”


(1) Enter the formula in the uppermost or leftmost cell of the intended range
(2) Place the cursor at the lower right corner of the cell to see the “fill handle” (the
cursor will become a black plus sign)
(3) Drag this “fill handle” down or across to copy

Excel Example 1.8:

A B A+B
10 7
6 9
4 9
5 1
2 8
8 10
10 7

o Double click the “fill handle”


(1) Enter the formula in the top or left-hand cell of the intended range.
(2) Double-click on the fill handle.

This method uses Excel’s built-in intelligence, but it only works in certain situations -
we need to have an adjacent column to indicate the range to be filled up. Using the

7
above example, formula entered in cell C2 can be copied to C3:C8 by double clicking
the fill handle. However, if you enter the formula in column D, double clicking the fill
handle would not work. In the spreadsheet above, copy the formula in C2 down through
C8 by double-clicking the “fill handle”.

• Absolute reference and relative reference


When we do calculations with some formulas, we sometimes want some parts of the
formula to stay fixed (absolute) and others to change relative to the cell position. Here, we
need to use absolute reference and relative reference.

Absolute reference and relative references are indicated in formulas by having dollar signs
or not. Note that the dollar signs are relevant only for the purpose of copying or moving;
they have no inherent effect on the formula. For example, in cell B1, the formulas =5*A1
and =5*$A$1 produce the same result. Their difference is relevant only if cell B1 is copied
or moved.

You may type the dollar sign(s) into the formula. An alternative way to do it is the F4 key.
In fact, pressing the F4 key repeatedly cycles through the possibilities: A1 (neither column
A nor row 1 fixed), then $A$1 (both column A and row 1 fixed), then A$1 (only row 1
fixed), then $A1 (only column A fixed), and back again to A1.

Excel Example 1.9:


Enter the formula =$B$2+$B$3*B6 in cell B7 and copy across to E7.
(Scroll to the right to see the correct answer)

Fixed cost $50


Variable cost $2

Month Jan Feb Mar Apr


Units produced 224 194 228 258
Total cost

Excel Example 1.10:


Enter the formula =$B6*C$5 in cell C6 that can be copied for C6:F10.
(Scroll to the right to see the correct answer)

Table of revenues for various unit prices and units sold

Units sold
50 100 150 200
Unit price $3.25
$3.50
$3.75
$4.00
$4.25

8
• Trace precedents and dependents
Checking formulas for accuracy or finding the source of an error may be difficult if formula
uses precedent or dependent cells:
Precedent cells — cells that are referred to by a formula in another cell. For example, if
cell D10 contains the formula =B5, then cell B5 is a precedent to cell D10.
Dependent cells — these cells contain formulas that refer to other cells. For example, if
cell D10 contains the formula =B5, cell D10 is a dependent of cell B5.
To assist you in checking your formulas, you can use the Trace Precedents and Trace
Dependents commands to graphically display and trace the relationships between these
cells and formulas with tracer arrows.

If Excel beeps when you click Trace Dependents Button image or Trace Precedents Button
image, Excel has either traced all levels of the formula, or you are attempting to trace an
item that is untraceable. The following items on worksheets that can be referenced by
formulas are not traceable using the auditing tools:
i. References to text boxes, embedded charts, or pictures on worksheets.
ii. PivotTable reports.
iii. References to named constants.
iv. Formulas located in another workbook that refer to the active cell if the other
workbook is closed.
Excel Example 1.11 (same dataset as example 1.10):
Enter the formula as example 1.15 and trace the precedents and dependents.

Table of revenues for various unit prices and units sold

Units sold
50 100 150 200
Unit price $3.25
$3.50
$3.75
$4.00
$4.25

9
✓ Microsoft Excel Data Management Skills
• Fill a series
Suppose we would like to have an index in a column, it is obvious that you could type in the
numbers 1, 2, 3, etc. one by one. It might be ok if we have 1 – 20 only, but it could be
extremely time-consuming if we have 1 – 1000 to be filled in cells A1 – A1000. In such
case, the following method would be helpful.

To fill a column range with a series


o Fill button
(1) Enter the first value in the first cell (for example, 1 in cell A1).
(2) Select the starting cell (A1). Under the Home tab, click Fill button and select Series.
A dialog box will pop up.
(3) Choose the correct options (Rows or Columns, Type, Date unit (if any))
(4) Choose the Step Value (i.e. the difference between each value)
(5) Input Stop Value (i.e. the last value you want) and click OK

As you can see from the dialog box, some other options (e.g. Growth, AutoFill, etc.) are
available as well. You could experiment with them and see what the effects are.

Excel Example 1.12:


The series of days should go from 1 to 25 in column A; and it should go from 26 to 50 in
column D.

Day Sales Day Sales


$227 $167
$157 $107
$143 $255
$129 $113
$102 $186
$116 $124
$269 $271
$111 $288
$210 $273

10
• Sort
Suppose we have a database that have various inputs under a variable, and we want to sort
the database according to some variables. Sorting data is an integral part of data analysis.
You can sort data by text (A to Z or Z to A), numbers (smallest to largest or largest to
smallest), and dates and times (oldest to newest and newest to oldest) in one or more
columns. You can also sort by a custom list you create (such as Large, Medium, and Small)
or by format, including cell color, font color, or icon set.

To sort by a variable
o Data → Sort
(1) To sort in ascending order, click or descending order, click .
(2) If you sort by cell color, font color, or icon set, select On Top for a column sort,
and On Left for a row sort. There is no default cell color, font color, or icon sort
order. You must define the order that you want for each sort operation.

Excel Example 1.13:


The following data set is sorted by model year, try to sort by 2nd level by mpg.

model cylin horsep


mpg weight origin car name
year ders ower
70 18 8 130 3504 1 chevrolet chevelle malibu
70 15 8 165 3693 1 buick skylark 320
70 18 8 150 3436 1 plymouth satellite
70 16 8 150 3433 1 amc rebel sst
70 17 8 140 3449 1 ford torino

11
• Filter
Filter your Excel data if you only want to display records that meet certain criteria.

To filter variables
o Data → Filter
(1) Click any single cell inside a data set.
(2) On the Data tab, in the Sort & Filter group, click Filter.

Excel Example 1.14 (same dataset as example 1.13):


Find the observations with missing horsepower entries using filter.
model cylin horsep
mpg weight origin car name
year ders ower
70 18 8 130 3504 1 chevrolet chevelle malibu
70 15 8 165 3693 1 buick skylark 320
70 18 8 150 3436 1 plymouth satellite
70 16 8 150 3433 1 amc rebel sst
70 17 8 140 3449 1 ford torino

• Data tools
When working with dataset, excel provide many useful data tools to help quickly manage
our dataset. We will introduce the following common ones, but keep in mind there are
many more other tools available in excel you can explore.
To separate the contents of one excel cell into separate columns, you can use the convert
text to columns wizard.
Flash fill automatically fills your data when it senses a pattern. For example, you can use
flash fill to separate first and last names from a single column or combine first and last
names from two different columns.

12
Sometimes duplicate data is useful, sometimes it just makes it harder to understand your
data. Use conditional formatting to find and highlight duplicate data. That way you can
review the duplicates and decide if you want to remove them.
Data validation in excel lets you control the data that can be entered in a cell. You can
restrict the user to enter only a specified range of numbers or text or date. You can also use
data validation functionality to create an excel drop down list.

To split content of one cell


o Data → Data Tools → Text to Columns
(1) Select the cell or column that contains the text you want to split.
(2) In the Convert Text to Columns Wizard, select Delimited → Next.
(3) Select the Delimiters for your data. For example, Comma and Space. You can
see a preview of your data in the Data preview window. Select Next.
(4) Select the Column data format or use what Excel chose for you.
(5) Select the Destination, which is where you want the split data to appear on your
worksheet. Select Finish.

To automatically fills data when excel senses a pattern


o Data → Data Tools → Flash Fill
(1) Enter the pattern in next column, and press ENTER.
(2) Start typing the next pattern in next column and row. Excel will sense the pattern
you provide and show you a preview of the rest of the column filled in with your
combined text.
(3) To accept the preview, press ENTER.
Note: If Flash Fill doesn't generate the preview, it might not be turned on. You can
go to Data → Flash Fill to run it manually, or press Ctrl+E. To turn Flash Fill on, go
to Tools → Options → Advanced → Editing Options → check the Automatically
Flash Fill box.

To remove duplicate data


o Data → Data Tools → Remove Duplicates
(1) Select the range of cells that has duplicate values you want to remove.
(2) Click Remove Duplicates, and then Under Columns, check or uncheck the
columns where you want to remove the duplicates.

To highlight invalid inputs


o Data → Data Tools → Data Validation
(1) On the Settings tab, under Allow, select an option: Set the other required values,
based on what you chose for Allow and Data.
(2) Select the Ignore blank checkbox if you want to ignore blank spaces.
(3) If you want to add a Title and message for your rule, select the Input Message
tab, and then type a title and input message. Select the Show input message when
cell is selected checkbox to display the message when the user selects or hovers
over the selected cell(s).
(4) Finally, click Data → Data Validation → Circle Invalid Data

13
Excel Example 1.15 (same dataset as example 1.13):
Split the single column variable car name into multiple columns using convert text to columns
wizard. Then, split only the car brand name into a new column using flash fill. Check number
of unique entries under variable car name using remove duplicates. Finally, define horsepower
200 or above as invalid and circle them using data validation.

model cylin horsep


mpg weight origin car name
year ders ower
70 18 8 130 3504 1 chevrolet chevelle malibu
70 15 8 165 3693 1 buick skylark 320
70 18 8 150 3436 1 plymouth satellite
70 16 8 150 3433 1 amc rebel sst
70 17 8 140 3449 1 ford torino

• Outline: Group and ungroup variables


If you have a list of data that you want to group and summarize, you can create an outline of
up to eight levels, one for each group. Each inner level, represented by a higher number in
the outline symbols, displays detail data for the preceding outer level, represented by a
lower number in the outline symbols.
To group/ungroup data in a list
o Data → Outline → Group
(1) Select the rows or columns you wish to group/ungroup.
(2) In the Group/Ungroup dialog box, select Rows or Columns and click OK.

• Freeze and split


To keep an area of a worksheet visible while you scroll to another area of the worksheet, go
to the View tab, where you can Freeze Panes to lock specific rows and columns in place, or
you can Split panes to create separate windows of the same worksheet.
To freeze columns and rows
o View → Freeze Panes
(1) Select the cell below the rows and to the right of the columns you want to keep
visible when you scroll. Select Freeze Panes.

14
You can view two areas of a sheet by splitting it into pane. When you split a sheet into separate
panes, you can scroll in both panes independently. By splitting the worksheet, you can scroll
down in the lower pane and still see the top rows in the upper pane.
To Split a sheet into panes
o View → Window → Split
(1) Select below the row where you want the split, or the column to the right of
where you want the split.
(2) On the View tab, in the Window group, click Split. To remove the split panes,
click Split again.

Excel Example 1.16 (same dataset as example 1.13):


Freeze the first row or split the dataset to view. Group variables cylinders and horsepower.

model cylin horsep


mpg weight origin car name
year ders ower
70 18 8 130 3504 1 chevrolet chevelle malibu
70 15 8 165 3693 1 buick skylark 320
70 18 8 150 3436 1 plymouth satellite
70 16 8 150 3433 1 amc rebel sst
70 17 8 140 3449 1 ford torino

15
✓ Microsoft Excel Functions
• IF function
There are many useful functions in Excel. Below are some basic ones that everyone should
know. Some other functions are given in the file Lesson 1B – Some Useful Functions in
Excel.xlsx in the course website as well.

Note: We capitalize the names of these functions just for emphasis. However, they are
NOT case sensitive in Excel.

=IF function
Enter the formula =IF(condition,expression1,expression2), where condition is any
condition that is either true or false, expression1 is the return value if the condition is
true, and expression2 is the return value if the condition is false

A simple example is =IF(A1<5,10,”NA”). Note that if either of the expressions is a


label (i.e. not a numerical value), it must be enclosed in double quotes.

Excel Example 1.17:


Enter appropriate IF formulas in column C.
(Scroll to the right to see the correct answer.)

For each product, if the end inventory is more than or equal to 50


units, enough units are ordered; otherwise, no.

Product End inventory Enough Order placed (Yes or No)?


1 100
2 50
3 20
4 70

IF function is useful; yet, it could be very complex. For example, IF functions are sometimes
nested if there are more than two possibilities. Excel allows up to seven level of nested IF
functions.

Excel Example 1.18:


Check whether the values are positive, negative or zero using nested IF formulas
(Scroll to the right to see the correct answer.)

Value Positive or Negative or Zero?


2
1
0
-5
-8

16
• Common errors
If you get an error from the Excel if function, this is likely to be one of the following:
#N/A Occurs if none of the supplied logical_tests evaluate to TRUE.
Occurs if one or more of the supplied logical_tests returns any value
#VALUE!
other than TRUE or FALSE.
Occurs if you are using an older version of Excel, that does not
#NAME?
support the function you are using.

• Other commonly used functions


In fact, there are many excel functions, below are some commonly used ones we will explore in
class. You are highly recommended to check out more excel functions for future use.
Excel Example 1.19:

Function Formula Input


return the left few characters =left() abcdefg
return the right few characters =right() abcdefg
return the middle few characters =mid() abcdefg
return the index of the character position
where the target phrase is found =find() abcdefg
return the length of text =len() abcdefg
substitute a character by another =substitute()abcdefg
convert all letters to upper cases =upper() abcdefg
convert all letters to lower cases =lower() ABCDEFG
return sum of the array =sum() 1 2 3
return sum of square of the array =sumsq() 1 2 3
return sum of product of two arrays =sumproduct() 1 2 3
return mean value =average() 1 2 3
return sd =stdev() 1 2 3
return min. value =min() 1 2 3
return max. value =max() 1 2 3
return the cell index from the array which
match the given value =match() a b c
return the cell value from an array with the
given index =index() a b c

return other column value with the first column


=vlookup()
matched with1 the given avalue
(particularly useful for merging data) 2 b
3 c
return other row value with the first row
matched with the given value =hlookup() 1 2 3
(particularly useful for appending data) a b c

17
✓ Import Dataset into Microsoft Excel
There are two ways to import data from a text file with Excel: you can open it in Excel, or you
can import it as an external data range. To export data from Excel to a text file, use the Save As
command and change the file type from the drop-down menu. You can import or export up to
1,048,576 rows and 16,384 columns. There are two commonly used text file formats:
o Comma separated values text files (.csv), in which the comma character (,) typically
separates each field of text.
o Delimited text files (.txt), in which the TAB character (ASCII character code 009)
typically separates each field of text. Or sometimes comma character (,).
You can change the separator character that is used in both delimited and .csv text files. This
may be necessary to make sure that the import or export operation works the way that you want
it to.

• Import a text file by opening it in Excel


You can open a text file that you created in another program as an Excel workbook by using the
Open command. Opening a text file in Excel does not change the format of the file — you can
see this in the Excel title bar, where the name of the file retains the text file name extension (for
example, .txt or .csv).
To import a text file
o Import Text Wizard
(1) Go to File > Open and browse to the location that contains the text file.
(2) Select Text Files in the file type dropdown list in the Open dialog box.
(3) Locate and double-click the text file that you want to open.
If the file is a .csv file, Excel automatically opens the text file and displays the data in a new
workbook.
If the file is a text file (.txt), Excel starts the Import Text Wizard. When you are done with the
steps, click Finish to complete the import operation.

18
Excel Example 1.20:
Import dataset Car_Evaluation.csv, Car_Evaluation_Tab.txt, Car_Evaluation_Comma.txt, and
Car_Evaluation_Fix.txt into excel.

19

You might also like