Tree Data Structure Math Simplified

Pro Members Only

This lesson is available if you have an active subscription.

Alternatively, some member might be able to gift it to you.

If you already have a subscription, you can sign in.

Tree Data Structure Math Simplified

Subscription Required

You must have an active subscription to access this content.
If you already have a subscription, you can sign in.


00:00 Before we start digging deeper into tree data structures

00:02 and algorithms, we need to cover some basic maths

00:05 around boundary trees and tree data structures in general.

00:08 This will help us appreciate the advantages

00:10 of using a tree data structure in an algorithm.

00:13 So let's go Now

00:16 before I scare you away with terms like trees

00:18 and graphs, we've actually already looked at a graph

00:21 on this particular channel.

00:22 When we looked at the link list data structure,

00:25 a link list is actually an example

00:27 of a graph data structure.

00:28 A graph is nothing more than a set

00:30 of nodes pointing to other nodes.

00:32 In the case of a link list, we have a head node pointing

00:35 to the next node via its next member and so on

00:38 and so forth till we eventually arrive at an null value.

00:42 Now a graph is a very general data structure

00:44 and it includes pretty much any data structure.

00:46 We have nodes pointing to other nodes.

00:50 So by this definition, the link list data structure

00:53 that we looked at is a graph.

00:55 Now a tree data structure is just a specialization

00:57 of the graph with a few restrictions,

01:00 and you could argue that the link list data structure

01:02 is a tree as well.

01:04 But let's look at a more general representation of

01:07 what people imagine when they think of a tree.

01:09 Fundamentally, the restrictions we impose on a graph for it

01:12 to be considered a tree is

01:14 that it must not contain any cycles.

01:17 There is normally a node called the root,

01:20 and from this root node we can actually make our path

01:23 to any other nodes within the tree.

01:26 Now, in order to better analyze the mathematics

01:28 around tree data structures, we will start off

01:30 with a simpler tree than this one.

01:32 There are two things that we will simplify.

01:35 First, we will only consider a balanced tree.

01:38 The tree that we are looking at right now is imbalanced.

01:41 This is because we've added a few leaf nodes

01:43 that are way off on one side without filling up the nodes

01:46 that are further up within the tree.

01:48 The other simplification that we will make is

01:50 that we will consider only a pinery tree.

01:53 This tree that we have over here is not a binary tree

01:56 as there are nodes that contain more than two

01:58 children and a binary tree.

02:00 Each node has only two children.

02:02 So here is the reference completely balanced binary tree

02:05 that we will use to analyze tree data structures.

02:09 Now before we jump into any formulas,

02:11 let's first analyze this tree a bit more to understand

02:14 how the number of nodes grow within each level of the tree.

02:18 Now just from visual observation of the tree, you can see

02:21 that at level zero we have one node resulting in total

02:24 count of one.

02:25 Within level one, we have two nodes within that level, along

02:29 with one node from the previous level, resulting in a total

02:32 of three nodes.

02:34 Similarly, at level two we have four

02:36 with four plus three giving us a total of seven.

02:39 And at level three we have eight nodes, which combined

02:42 with the previous seven result in a total of 15.

02:45 Now we could continue, but this should be enough for us

02:47 to understand a general pattern.

02:50 Now the first thing that we can notice is

02:52 that at each double, the number of nodes essentially double.

02:55 So we start off at level zero with one node,

02:58 and then in level one we get two.

03:00 In level two we get four and so on.

03:02 And this makes perfect sense because

03:04 after all, this is a boundary tree

03:06 and each node in the previous level essentially has two

03:09 children resulting in double the number of nodes.

03:11 And now the other observation that we can make

03:13 that is perhaps not as obvious if you haven't experienced it

03:16 before, is that the total number

03:18 of nodes in the previous level is actually one minus the

03:22 number of nodes in the current level.

03:24 We can see this in the relationship between two

03:26 and one, four and three and eight and seven.

03:30 Now, armed with these observations, we can write the formula

03:33 for the total at any given level as to the power L,

03:37 that is the current number of nodes plus two

03:39 to the power L minus one.

03:40 That is the number of nodes

03:42 that we get from the previous total.

03:44 As an example, consider level three

03:46 where we have eight current nodes, which is simply two

03:49 to the power three combined with seven nodes, which is two

03:53 to the power three minus one from the previous level,

03:56 resulting in a total number of nodes of 15.

03:59 Now we can combine the two two to the power Ls to get two

04:03 to the power L plus one.

04:05 Now we've been working with the zero based index

04:07 for the level, but we can take this l plus one

04:09 and come up with a new concept known as the height

04:12 of a tree, which is simply going to be a plus one.

04:15 We just start our index at value one.

04:18 So with that, we have a nice relationship between the height

04:21 of the tree versus the number of items

04:23 that it can accommodate.

04:25 Now why is this relationship useful?

04:27 If you look at the tree data structure quite commonly

04:29 what we will be doing is you'll be searching only

04:32 between the height of the tree instead of going through all

04:34 of the nodes within the tree.

04:36 So if N is the number of nodes in the tree

04:38 and itches the height, instead of having to look at n items,

04:41 we would only need to look at etch items.

04:44 So let's figure out the equation for etch.

04:46 Right now we know the equation for N to be two

04:48 to the power etch minus one.

04:51 We simply flip the equation around

04:52 to bring etch on the left hand side,

04:54 move minus one onto the right hand side,

04:56 and then take log on both sides to get just etch.

05:00 On the left hand side. This gives us edge is equal

05:03 to log two of n plus one.

05:05 Now, when we talk about algorithms, we rarely deal

05:07 with exact equations

05:08 and we normally talk about big annotation because

05:11 after all, we don't take into consideration the differences

05:14 in the different processes,

05:15 which if you don't take quantum computing into account is

05:18 probably going to be a constant factor difference.

05:21 So RH is equal to log two of n plus one is the same as O

05:24 of H is equal to log of n.

05:27 Note that big O does not care about that plus one

05:30 or the base of that log.

05:31 Now that's it for the analysis

05:33 of a perfectly balanced binary tree, but let's go back

05:36 and modify some of our assumptions

05:37 to make this even more general.

05:39 For more three data structures, the first thing

05:42 that we will look at is a constant factor of imbalance.

05:46 For example, this particular tree is twice

05:48 as heavy on the left hand side versus the right hand side,

05:50 which means that in the worst case scenario, it's going

05:53 to be twice as high than our perfectly,

05:55 perfectly balanced tree

05:56 That we looked at in our previous example.

05:59 Now if you can make the guarantee that it's never going

06:01 to exceed twice

06:03 as heavy imbalance on one side versus the other, then as far

06:07 as big O algorithmic analysis is concerned,

06:09 this tree is perfectly balanced

06:11 as big O does not care about constant factors like two,

06:16 maintaining a maximum worst case imbalance essentially makes

06:19 trees balanced as far as pego is concerned.

06:23 And the other key assumption that we made for our tree is

06:25 that it is a binary tree

06:27 that is each node has only two children.

06:30 That is not an assumption that is true

06:32 for many tree data structures out there.

06:34 For example, here is a quarter

06:36 where each node has four children,

06:39 and the only thing that will change in our analysis is

06:41 that instead of log two, we will end up

06:43 with some form of log four.

06:46 Now for those that are familiar with log functions,

06:48 the difference between two log functions

06:50 that have a different base is actually just a

06:53 simple factor difference.

06:54 For example, the difference

06:55 between log two versus log four is this factor

06:58 log two of four.

07:00 And as far as a big O is concerned once more,

07:03 it really doesn't care about a constant factor and

07:05 therefore doesn't really care about log basis.

07:07 And it'll be the same as log of N that we saw previously.

07:13 So to summarize, if you solve a problem using a tree data

07:16 structure that is partially balanced

07:18 and you only need towards the height of the tree as far

07:21 as your runtime is going to be concerned, it's going

07:23 to be approximately log of the number of items in the tree.

07:27 And if anybody ask you whether this is a great thing,

07:29 you can throw a few numbers.

07:31 For example, you can say that the log 10 of 10 is one,

07:34 and the log 10 of a thousand,

07:36 which has three zeros is going to be three.

07:38 Similarly, the log 10 of a million is going to be six

07:42 and log 10 of a billion is going to be nine.

07:44 So you can see that it is increasing extremely slowly.

07:47 And the same is true for other log bases as well

07:50 because as we've discussed the difference

07:52 between two log functions that have a different base,

07:55 it's a constant factor.

07:56 So log two of a billion is still going to be approximately

07:59 around 30.