Professional Documents
Culture Documents
36C L2 (1)
36C L2 (1)
36C L2 (1)
Data Structures
Spring 2024 - Instructor Siena Saltzen
Administrative
https://canvas.ucdavis.edu/courses/877554/assignments/syllabus
● No discussion/OH the first week.
● This week I am deciding the homework format, and I will update the canvas then
applicable. It will either be ~8HWs with 2 drops or ~5 HW with one drop.
● There will be a getting started survey worth 2 points up this evening.
● And a practice homework assignment, up this weekend.
●
Let’s do Lists
An array is a series of elements of the same type placed in contiguous memory
locations that can be individually referenced by adding an index to a unique identifier.
int = 4 bytes
size of array = 40 bytes of memory
So *(arr) will return the contents of memory at 0x1000, which happens to be the same
as arr[0].
When you do arr + i, the compiler is smart enough to know how many bytes to add to
arr because it knows the type of the array. Since it is int, it will add 4 * i.
Okay Cool, so what?
Given array A = [10, 14, 65, 85, 96, 12, 35, 74, 69]
After inserting 23 at the “beginning”, the array will look like this:
[23, 10, 14, 65, 85, 96, 12, 35, 74, 69]
https://www.tutorialspoint.com/cplusplus-program-to-add-an-element-in-the-array-at-the-beginning
Adding a Thing
So, we have an array A with nine elements in it. We are going to insert another element 23, at the
beginning of array A.
To insert an element at the beginning, we must shift all elements towards the right for one place,
then the first slot will be empty and we put the new element in that position.
Show Code Here
Okay Great
Could you use vectors? Yes. Not the point.
Linked List: Linked lists are less rigid in their storage structure and
elements are usually not stored in contiguous locations, hence they
need to be stored with additional tags giving a reference to the next
element.
Let’s Look at Linked Lists
https://www.explainxkcd.com/wiki/index.php/2483:_Linked_List_Interview_Problem
For now, let start with singly linked lists
A singly linked list is a linear data structure in which the elements are
not stored in contiguous memory locations and each element is
connected only to its next element using a pointer.
How to Insert a Node at the Front/Beginning of LL
We will come back to LL in more detail and work through the other operations,
however, I find that a general understand of these two structures helps lay the
foundation for ALGORITHM ANALYSIS.
List Representations
You may not have seen the first representation before, but you’ve
certainly seen the second representation. The list with the square
brackets represents a series of contiguous locations in memory, each
containing a pointer to its respective list element (we could call them
“objects”).
Given this list and a character, how do you determine if the
character is in the list?
Given this list and a character, how do you
determine if the character is in the list?
1. Start with the first element of the list, look at
the value there,
a. if it’s what I’m looking for then success
2. Else move to the next element of the list and
a. do it again.
3. If I run out of list
a. then failure.
What do you think the average number of comparisons would be? Something like n/2?
Linked List
Now let’s look at a different linear collection of items. This is a singly linked list (sometimes called
a one-way linked list).
You can create this kind of list in C++, we just did! But it definitely not the default of type list()
It is, on the other hand, how fundamental lists are represented in other programming
languages, such as Lisp, Scheme, and Haskell.
Linked lists are more flexible and adaptable and are best suited for situations where the size of
the collection is not known.
Linked List Cont.
In this list model, the pairs of squares represent small memory blocks, each
containing two pointers, one to the corresponding list object and one to the
next item (i.e., the next memory block) in the list.
What to do?
Now, given this new list and a character, how do you determine if the
character is in the list? We don’t expect you to know how to do this…yet.
So here’s a pseudo-code function that might work:
How many comparisons now will be required to find B?
How many comparisons to find E?
What happens when the list is bigger? Generalize again…
Two different data structures, with
associated algorithms for
finding something in the structure.
given word?
number?
If you have run out of list (i.e. empty list) then failure
• Look at the item in the middle of the list
• If that’s the target then success
• Otherwise, does the target come before or after what’s at the
middle?
• Based on the answer to that question, the sublist to the left
or right of the middle becomes the new list to search
• Go back to the top and do it agai
Binary Search!
Binary Search
(And if you’re not comfortable with logarithms, start brushing up. We’ll be
using them in this class.)
Big O
Do we gain the same benefit by sorting the elements of this list? (Bonus Point)
Do we gain the same benefit by sorting the elements of this list?
What would be the algorithm for performing a binary style of search on
this list?
Let’s pose the question a different way: is there a simple one-step way
to get at the middle element of this list?
Do you see a problem now with getting to the middle of this list in one
simple step?
With the original Python list, finding the midpoint of the unsearched
remainder of the list was simple arithmetic.
Here it requires traversing half the links, and if we get to the middle how
do we go left? And wouldn’t we have already searched to the left anyway?
This isn’t promising.
Just for fun! Let's think about how to get to the middle of the list.
Can we get to the center of this list with some more thinking? (+0.1 bonus points)
*(Can use different calculation to determine where the split falls in even len lists)
Reminder
Unlike arrays, linked list elements are not stored at a contiguous location;
the elements are linked using pointers. Each node in the linked list will have
two things, one is the data and the other is a reference to the next element.
The last node will have a null in the reference.
Pointers! (python style)
Tortoise and Hare
● Why?
●
Let’s compare the sorted Python list to the sorted singly linked list.
Now which one is better? Why?
Binary Search Array vs LL
*I tend to use array/list interchangeably when talking about python. Please interrupt me if
there is any confusion.
LL
Which is “better” depends on the context: how will you be using the data structure
and its associated algorithms?
Realistically, if you want both (and you probably do, plus deletions) there are better
choices. More about these in the future.
Questions like “which data structure is better?” or “which algorithm is
better?” may not have absolute answers.
“It depends...” may be the beginning of many such analyses.
When you try to answer these questions, you may have to wrestle with
trade-offs, just like the trade-off in the previous example:
the linked list representation trades space (i.e., uses extra memory) to
gain time (i.e., faster insertion and deletion).
Time-space trade-offs are common.
Why
We just talked about why LL are kinda Lame right? So why are we studying the classic
algorithms when there are better options out there?
Modern programming languages usually provide abstractions which interact with the
sequential data at the memory level, providing access to this data while using arrays, linked
lists, hybrids of the aforementioned technologies, or other approaches, and the programmer
doesn't necessarily need to care one way or another.
Knowing the underlying concepts is still useful, however, when creating fast running code
which scales well to large data, avoiding (e.g.) traversing the list over and over again, or
performing particularly inefficient operations.
They are useful! - like prepackaged intelligence in a can - don’t have to work hard to come
up with your own solution.
Also… The interviewers still love this stuff.
O of What
So, at this point we’ve talked about complexity a little bit, and showed how to start
generalizing some runtimes.
If you read your textbook (or payed attention yesterday), you’ve see things like O(n)
and O(1).
What does O(n) mean? O(1)?
They're how we compare algorithms regardless of operating system/hardware.
In computer science, the time complexity is the computational complexity that describes the amount of time it takes to run an
algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm,
supposing that each elementary operation takes a fixed amount of time to perform https://en.wikipedia.org/wiki/Time_complexity
Back Ground
Computational complexity is a field from computer science which analyzes
algorithms based on the amount resources required for running it. The
amount of required resources varies based on the input size, so the
complexity is generally expressed as a function of n, where n is the size of
the input.
It is important to note that when analyzing an algorithm we can consider the
time complexity and space complexity. The space complexity is basically the
amount of memory space required to solve a problem in relation to the input
size. Even though the space complexity is important when analyzing an
algorithm, in this story we will focus only on the time complexity.
Best Worst Average
Usually, when describing the time complexity of an algorithm, we are talking about the worst-case.
Big-O notation, sometimes called “asymptotic notation”, is a mathematical
notation that describes the limiting behavior of a function when the argument
tends towards a particular value or infinity.
This is independent of the actual time that the algorithm may take. For example I
could rewrite the linear search to print out each value, ask the user what they think
of each item and then calculate item to the 100th power. This would still be a O(n)
function as the growth rate is still linear proportional to the input…
https://towardsdatascience.com/understanding-time-complexity-with-python-examples-2bda6e8158a7
Common Complexities and Their Names
https://www.explainxkcd.com/wiki/index.php/399:_Travelling_Salesman_Problem
Concrete Example Time
Timing and Fibonacci
How long does this take? A second? A
minute?
And what is this Fibonacci thing you’re
talking about?
An algorithm for computing the nth
Fibonacci number.
Why Fibonacci numbers? No special
significance...it's just a program that's
easy to analyze.
Where did this Fibonacci thing come
from? I'm glad you asked...
Bunny Math
What if the item is the first in the list? The last in the list? Not in the list?
Which run time should we report?
There are different kinds of analysis we could report:
• Best Case
• Worst Case
Amortized:
• Average Case In accounting, amortization refers to expensing the acquisition cost minus the residual value of
intangible assets in a systematic manner over their estimated "useful economic lives" so as to
reflect their consumption, expiry, and obsolescence, or other decline in value as a result of use
In computer science and algorithms, amortized analysis is a technique used to estimate the
average time complexity of an algorithm over a sequence of operations, rather than the
• Amortized worst-case complexity of individual operations.
For example, for a dynamic array that doubles in size when needed, normal asymptotic analysis
• and so on.. would only conclude that adding an item to it costs O(n), because it might need to grow and
copy all elements to the new array. Amortized analysis takes into account that in order to have
to grow, n/2 items must have been added without causing a grow since the previous grow, so
adding an item really only takes O(1) (the cost of O(n) is amortized over n/2 actions).
Big-O notation
Going back to the iterative Fibonacci example, we calculated the function for the run
time to be 4n - 5. Then we argued that the 4 and the 5 are noise for our purposes.
One way to say all that is “The running time for the Fibonacci algorithm finding the
nth Fibonacci number is on the order of n.”
In Big-O notation, that’s just T(n) is O(n)
T(n) is our shorthand for the runtime of the function being analyzed.
The O in O(n) means “order of magnitude”, so Big-O notation is clearly, and
intentionally, not precise. It’s a formal notation for the approximation of the time (or
space) requirements of running algorithms.
Formal Definition
T(n) is O(f(n)) if there are two positive constants, n0 and c, and a function f(n)
such that cf(n) >= T(n) for all n > n0
really means in a practical sense:
If you want to show that T(n) is O(f(n)) then find two positive constants, n0 and c,
and a function f(n) that satisfy the constraints above. For example...
Big-O arithmetic
* Desmos
Hmmm, let’s try 1.
1 <= 1 + 5/4
For the first section: For each problem in code, find T(n). The same way we
did for the fibonacci sequence and search examples. Then decide and
prove O(n) by providing C and n0.
It is possible that you find values for n and c where it all appears to work, but if you
then try out larger values of n and the inequality no longer holds. Your proof isn't a
proof.
So in this class, make sure that, once you have found a working n and c, you try some
larger values of n. There's an example of how you might be deceived coming up.
The Big-O definition simply(?) says that there is a point n0
such that for all values of n that are past this point, T(n) is
bounded by some multiple of f(n). Thus, if the running time
T(n) of an algorithm is O(n2), we are guaranteeing that at
some point we can bound the running time by a quadratic
function (a function whose high-order term involves n2).
Thus, if the running time T(n) of an algorithm is O(n2),
we are guaranteeing that at some point we can bound the running time by a
quadratic function (a function whose high-order term involves n2).
Big-O says there’s a function that is an upper bound to the worst-case performance
for the algorithm.
Note however that if T(n) is linear and not quadratic, you
could still say that the running time is O(n2).
It’s technically correct because the inequality holds. However, O(n) would
be the more precise claim because it’s an even lower upper bound.
Desmos Example
Big-O is for expressing how run time or memory requirements grow as a function of the
problem size. Your book* has a nice table listing commonly-encountered rates.
O(1) Constant
O(log n) Logarithmic
O(n) Linear
O(n log n) Log-linear
O(n2) Quadratic
O(n3) Cubic
O(nk) Polynomial – k is constant
O(2n) Exponential
O(n!) Factorial