Tree Data Structure Math Simplified

Pro Members Only

This lesson is available if you have an active subscription.

Alternatively, some member might be able to gift it to you.

If you already have a subscription, you can sign in.

Tree Data Structure Math Simplified

Subscription Required

You must have an active subscription to access this content.
If you already have a subscription, you can sign in.

javascript
typescript
react
playwright

Enjoy free content straight from your inbox 💌

No spam, unsubscribe at any time.

Transcript

00:00

Before we start digging deeper into tree data structures and algorithms, we need to cover some basic maths around boundary trees and tree data structures in general. This will help us appreciate the advantages of using a tree data structure in an algorithm. So let's go Now before I scare you away with terms like trees and graphs, we've actually already looked at a graph on this particular channel. When we looked at the link list data structure, a link list is actually an example of a graph data structure. A graph is nothing more than a set of nodes pointing to other nodes.

00:32

In the case of a link list, we have a head node pointing to the next node via its next member and so on and so forth till we eventually arrive at an null value. Now a graph is a very general data structure and it includes pretty much any data structure. We have nodes pointing to other nodes. So by this definition, the link list data structure that we looked at is a graph. Now a tree data structure is just a specialization of the graph with a few restrictions, and you could argue that the link list data structure is a tree as well.

01:04

But let's look at a more general representation of what people imagine when they think of a tree. Fundamentally, the restrictions we impose on a graph for it to be considered a tree is that it must not contain any cycles. There is normally a node called the root, and from this root node we can actually make our path to any other nodes within the tree. Now, in order to better analyze the mathematics around tree data structures, we will start off with a simpler tree than this one. There are two things that we will simplify. First, we will only consider a balanced tree.

01:38

The tree that we are looking at right now is imbalanced. This is because we've added a few leaf nodes that are way off on one side without filling up the nodes that are further up within the tree. The other simplification that we will make is that we will consider only a pinery tree. This tree that we have over here is not a binary tree as there are nodes that contain more than two children and a binary tree. Each node has only two children. So here is the reference completely balanced binary tree that we will use to analyze tree data structures. Now before we jump into any formulas,

02:11

let's first analyze this tree a bit more to understand how the number of nodes grow within each level of the tree. Now just from visual observation of the tree, you can see that at level zero we have one node resulting in total count of one. Within level one, we have two nodes within that level, along with one node from the previous level, resulting in a total of three nodes. Similarly, at level two we have four with four plus three giving us a total of seven. And at level three we have eight nodes, which combined with the previous seven result in a total of 15.

02:45

Now we could continue, but this should be enough for us to understand a general pattern. Now the first thing that we can notice is that at each double, the number of nodes essentially double. So we start off at level zero with one node, and then in level one we get two. In level two we get four and so on. And this makes perfect sense because after all, this is a boundary tree and each node in the previous level essentially has two children resulting in double the number of nodes. And now the other observation that we can make that is perhaps not as obvious if you haven't experienced it before, is that the total number

03:18

of nodes in the previous level is actually one minus the number of nodes in the current level. We can see this in the relationship between two and one, four and three and eight and seven. Now, armed with these observations, we can write the formula for the total at any given level as to the power L, that is the current number of nodes plus two to the power L minus one. That is the number of nodes that we get from the previous total. As an example, consider level three where we have eight current nodes, which is simply two to the power three combined with seven nodes, which is two

03:53

to the power three minus one from the previous level, resulting in a total number of nodes of 15. Now we can combine the two two to the power Ls to get two to the power L plus one. Now we've been working with the zero based index for the level, but we can take this l plus one and come up with a new concept known as the height of a tree, which is simply going to be a plus one. We just start our index at value one. So with that, we have a nice relationship between the height of the tree versus the number of items that it can accommodate.

04:25

Now why is this relationship useful? If you look at the tree data structure quite commonly what we will be doing is you'll be searching only between the height of the tree instead of going through all of the nodes within the tree. So if N is the number of nodes in the tree and itches the height, instead of having to look at n items, we would only need to look at etch items. So let's figure out the equation for etch. Right now we know the equation for N to be two to the power etch minus one. We simply flip the equation around to bring etch on the left hand side, move minus one onto the right hand side, and then take log on both sides to get just etch.

05:00

On the left hand side. This gives us edge is equal to log two of n plus one. Now, when we talk about algorithms, we rarely deal with exact equations and we normally talk about big annotation because after all, we don't take into consideration the differences in the different processes, which if you don't take quantum computing into account is probably going to be a constant factor difference. So RH is equal to log two of n plus one is the same as O of H is equal to log of n. Note that big O does not care about that plus one or the base of that log.

05:31

Now that's it for the analysis of a perfectly balanced binary tree, but let's go back and modify some of our assumptions to make this even more general. For more three data structures, the first thing that we will look at is a constant factor of imbalance. For example, this particular tree is twice as heavy on the left hand side versus the right hand side, which means that in the worst case scenario, it's going to be twice as high than our perfectly, perfectly balanced tree That we looked at in our previous example. Now if you can make the guarantee that it's never going to exceed twice

06:03

as heavy imbalance on one side versus the other, then as far as big O algorithmic analysis is concerned, this tree is perfectly balanced as big O does not care about constant factors like two, maintaining a maximum worst case imbalance essentially makes trees balanced as far as pego is concerned. And the other key assumption that we made for our tree is that it is a binary tree that is each node has only two children. That is not an assumption that is true for many tree data structures out there. For example, here is a quarter

06:36

where each node has four children, and the only thing that will change in our analysis is that instead of log two, we will end up with some form of log four. Now for those that are familiar with log functions, the difference between two log functions that have a different base is actually just a simple factor difference. For example, the difference between log two versus log four is this factor log two of four. And as far as a big O is concerned once more, it really doesn't care about a constant factor and therefore doesn't really care about log basis. And it'll be the same as log of N that we saw previously.

07:13

So to summarize, if you solve a problem using a tree data structure that is partially balanced and you only need towards the height of the tree as far as your runtime is going to be concerned, it's going to be approximately log of the number of items in the tree. And if anybody ask you whether this is a great thing, you can throw a few numbers. For example, you can say that the log 10 of 10 is one, and the log 10 of a thousand, which has three zeros is going to be three. Similarly, the log 10 of a million is going to be six and log 10 of a billion is going to be nine. So you can see that it is increasing extremely slowly.

07:47

And the same is true for other log bases as well because as we've discussed the difference between two log functions that have a different base, it's a constant factor. So log two of a billion is still going to be approximately around 30.