Importance of Studying Algorithms
Importance of Studying Algorithms
Introduction
Problem
Algorithm
The reference to the instructions in the definition implies that there is something or someone.
capable of understanding and following the given instructions. We call this a "calculator",
keeping in mind that before the invention of the digital computer, the term 'calculator'
designated a man involved in performing numerical calculations. Today of course,
Computers are electronic devices that have become essential in most
things that we do. However, note that even if the majority of algorithms are indeed
2
designed for a possible computer implementation, the notion of algorithm does not rely
essentially on such an assumption.
As examples to illustrate the notion of an algorithm, we consider in this paragraph three
different methods to solve the same problem: calculating the greatest common divisor of
two natural integers. These examples will help us illustrate several important points:
The required unambiguity for each step of an algorithm can never be compromised.
The range of inputs for which an algorithm works must be specified carefully.
The same algorithm can be represented in different ways.
Several algorithms to solve the same problem may exist.
Algorithms for the same problem can be based on different ideas and
they can solve the problem with radically different speeds.
Recall that the greatest common divisor of two positive integers that are not all zero, denoted
gcd(m,n) is defined as the largest natural integer that divides at the same time. Euclid of
Alexandria proposed an algorithm to solve this problem in one of its volumes.
Elements, more famous for its systematic presentation of geometry. In modern terms,
The Euclidean Algorithm is based on the repetitive application of the relation:
gcd(m,n) gcd(n, m mod n)
(where the remainder of the Euclidean division is) until what remains is equal to
zero. Like gcd(m, 0) m (why?), the last value is also the greatest divisor
common demetn.
Here is a more structured description of this algorithm.
Euclidean Algorithm(m, n)
Calculating GCD(m, n) using Euclid's algorithm
Two positive integers not all null.
The greatest common divisor.
whilen 0do
r mmodn;
m n;
n r
returnm
How to know that Euclid's algorithm eventually stops? This results from
the observation that the second number of the pair becomes smaller with each iteration and cannot
become negative. Indeed, the new value in the next iteration is modn, which is always
smaller than. Thus, the value of the second number in the pair eventually becomes zero and
the algorithm stops.
3
As with many other problems, there are several algorithms to calculate the greatest
common divisor. Let's look at two other methods for this problem. The first is simply
based on the definition of the greatest common divisor as the largest integer that
divide the two numbers. Clearly, such a common divisor cannot be larger than the greatest
small of these numbers, which we denote by p = min{m, n}. Thus we start by checking if p
divide the two numbers, if yes is the answer, otherwise we decrement and try again.
Let us note that, unlike the Euclidean algorithm, this algorithm, in the form presented here,
does not work correctly when one of the two numbers is zero. This example illustrates why it
It is important to explicitly and carefully specify the domain of permitted inputs of a
algorithm.
The third procedure for calculating the greatest common divisor will be familiar to the students of
middle classes.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
2 3 5 7 9 11 13 15 17 19 21 23
2 3 5 7 11 13 17 19 23
2 3 5 7 11 13 17 19 23
4
For this example, no other passage will be necessary after the removal of the multiples of
5 any additional passage attempt will try to eliminate numbers that have already been eliminated during the iterations
previous. The remaining numbers in the list are the consecutive prime numbers less than or
equal to 24.
En général, quelle est la plus grande valeur de p dont les multiples peuvent encore restant dans la
liste ? Avant de répondre à cette question, notons d’abord que si p est un nombre dont les multiples
are eliminated in the current passage, so the first multiple that will be considered is p2because
all of its multiples less than it have already been eliminated in previous rounds. This observation
helps avoid eliminating the same number multiple times. Clearly, p2will not be greater than
n, and consequently p cannot exceed the integer part of n , denoted n We assume
In the following pseudocode, a function is available to calculate; alternatively, we
we can check the inequality p.p n as a condition for continuing the loop.
ALGORITHM Sieve(n)
Implement the Sieve of Eratosthenes
An integer n 2
A vector containing all the prime numbers less than or equal to n
forp 2round A[p] p
forp 2to n do
ifA[p] 0
j p*p
whilej I do
A[j] 0 //mark an item as deleted
j j+p
Copy the rest of the elements from A to L like the prime numbers.
i 0
forp 2tondo
ifA[p] 0
L[i] A[p];i i + 1
returnL
Thus we can now introduce the sieve of Eratosthenes in a class procedure.
intermediary to obtain a legitimate algorithm for calculating the greatest common divisor of
two positive integers.
Exercises 1.1
Problem 1. Conduct research on al-Khorezmi (or al-Khwarizmi), the man from whom it originates
the word 'algorithm'. In particular, you will be able to learn about the origins of words
"algorithm" and "algebra" have in common.
Problem 2. Design an algorithm to calculate n for a positive integer n. In addition to the
affectations and comparisons, your algorithm can only use the four operations
basic arithmetic.
Problem 3. Prove the equality gcd(m, n) gcd(n, m mod n) for every pair of positive integers m
etn.
Problem 4. What does Euclid's algorithm do for a pair of numbers where the first is greater?
smaller than the second? What is the maximum number of times this can occur during execution of
the algorithm on such inputs?
5
Problem 5.a) What is the smallest number of divisions performed by the Euclidean algorithm?
for all entries 1 m,n 10
b) What is the largest number of divisions performed by the Euclidean algorithm for all
entrées 1 m,n 10 ?
Let's start by recalling an important point made in the introduction of this chapter: We can
consider algorithms as procedural solutions to problems.
These solutions are not answers but specific instructions to obtain the answers.
It is this emphasis on precisely defined constructive procedures that distinguishes
the computer science of other disciplines. In particular, cecila distinguishes theoretical mathematics
where practitioners are typically satisfied just by the demonstration of the existence of a
solution to a problem and possibly by studying the properties of the solution.
We list and briefly discuss a series of steps that can be followed to design and
analyze an algorithm (Figure 1.2).
Design an algorithm
From a practical perspective, the first thing you need to do before designing a
The algorithm is to fully understand the problem to be solved. Read carefully the
description of the problem and ask questions if you have any doubts regarding the problem,
execute some examples by hand, think about special cases and ask more questions if
necessary.
6
There are a few types of problems that are encountered quite often in applications.
IT. We will review them in the next section. If the problem you
want to solve is among them, it will be possible for you to use one of the known algorithms
to solve it. Of course, it is good to understand how such an algorithm works and
know your strengths and weaknesses, especially if you have to choose between several algorithms
existing. But often, you will not find a directly usable algorithm to solve
your problem. At that moment, you should design yours by relying on it if possible
the existing algorithms and the many algorithm design techniques that we
we will study in the course that follows. The sequence of steps indicated in this section can help you.
in this exciting but not always easy task.
An input to an algorithm specifies an instance of the problem that the algorithm solves. It is very
It is important to specify exactly the range of instances that the algorithm must process. You
Failing to do so, your algorithm will be able to function correctly for the majority of inputs.
but fails on some boundary values. Remember that a correct algorithm is not one
who walks very often, but the one who walks correctly for legitimate entries.
You should not skip this first step of the algorithmic problem-solving process.
problems because if you do, you risk doing unnecessary work.
1.2.2 Check the capabilities of the computer resources
Once you have fully understood the problem, you need to check the
capabilities of the target computing system of your algorithm. The vast majority of algorithms
currently used are still intended to be programmed on machines very similar to the
von Neumann machine an architecture of machine proposed by the famous mathematician
Hungarian-American John von Neumann. The essence of this architecture is captured by what one
called random access memory (RAM). Its main hypothesis is that instructions are
executed one after the other, one operation at a time. Consequently, the algorithms designed for
executed on such machines are called sequential algorithms.
The main hypothesis of the RAM model does not hold for new computers that
can execute operations concurrently i.e., in parallel. The algorithms that rely on
These capabilities are called parallel algorithms. The study of design techniques and
The analysis of algorithms within the RAM model will remain a cornerstone for a long time.
angular of algorithmics.
Could you have regrets about the speed and memory capacity of a computer?
What do you have? If you design an algorithm as a scientific exercise, the answer
is an unqualified no. As you will see in Section 2.1, most scientists
Computer scientists prefer to study algorithms independently of the specification of parameters.
of a particular computer. If you conceive an algorithm as a practical tool, the answer
may depend on the problem you want to solve. Even the computers that we consider
as slow as today are almost unimaginably fast. Therefore, in most
In situations, you don't have to regret that a computer is so slow for the task. It exists.
however, very complex significant problems, dealing with large volumes of
data or dealing with applications where time is crucial. In such situations, it is imperative
to pay attention to the speed and available memory on a particular computer system.
1.2.3 Choose between an approximate or exact solution
The next decision is to choose between solving the problem exactly or solving it approximatively.
approximately. In the first case, an algorithm is called an exact algorithm, in the
In the last case, an algorithm is called an approximate algorithm. Why might one choose to use an
7
approximate algorithm? First of all, there are significant problems for which most
Some instances cannot be resolved exactly; examples include the calculation of
square roots, the resolution of non-linear equations, and the evaluation of definite integrals.
Secondly, the algorithms available to exactly solve certain problems can
to be excessively slow due to the intrinsic complexity of the problem. This happens, in particular,
for several problems involving a large number of choices; you will find examples of
these difficult problems in Chapters 3 and 8. Thirdly, an approximation algorithm can
to be part of a more sophisticated algorithm that solves exactly one problem.
Now that all the components of the algorithmic problem-solving are in place,
How can you design an algorithm to solve a given problem? This is the
question principale à laquelle ce cours cherche à répondre en vous enseignant plusieurs techniques
General design principles. What is a design technique?
Consult the summary of this course and you will see that the majority of the chapters are dedicated to
individual design techniques. They distill some key ideas that have shown their
utility in algorithm design. Studying these techniques is of paramount importance for
the following reasons:
First, they provide a guideline for the design of algorithms for new
problems, that is to say problems for which there is no satisfactory algorithm.
consequently—for using the language of a famous proverb—learning such techniques is
useful for learning to fish instead of being given fish caught by someone else. It is not
it is not true that each of these general techniques will necessarily be applicable to
each of the problems you may encounter. But taken together, they form a
powerful collection of tools that you will surely find useful in your studies and your work.
8
1.2.6 Methods of Describing Algorithms
Once you have designed an algorithm, you need to describe it in a certain way.
In Section 1.1, to give you an example, we described the Euclidean algorithm.
literally (in a free form and also in a step-by-step form) and in pseudocode. This
Here are the two options that are most commonly used for algorithm specification.
Using a natural language has an obvious appeal; however, the inherent ambiguity of any
which natural language makes the succinct and clear description of algorithms surprisingly difficult.
Nevertheless, being able to do it is an important skill that you should develop.
in your algorithm learning process.
Unpseudocode is a mixture of a natural language and the constructions of a language of
programming. A pseudocode is usually more precise than a natural language and its use
often produces more concise descriptions of the algorithms. What is surprising here is that
computer scientists have never agreed on a single form of pseudocode, leaving each
author of work the latitude to define their own dialect. Fortunately, these dialects are quite
closer to each other than anyone familiar with a modern programming language
will be able to understand them all.
The dialect we have adopted in this course was chosen to cause the least amount of trouble.
possible for readers. For the sake of simplicity, we omit variable declarations and
let's use indentations to show the scope of instructions such as for, if, and while. We
let's use the arrow pour l’opération d’affectation et deux slash // pour les commentaires.
In the early days of computing, the dominant support for the specification of algorithms
was flowcharts, a method of expressing algorithms through a collection of figures
connected geometries containing the descriptions of the steps of the algorithm. This technique of
representation has proven to be inconvenient for everything but very simple algorithms; from our
days, it can only be found in old books on algorithms.
The state of the art in computer science has not yet reached a point where the description of an algorithm, in
natural language or pseudocode can be input directly into the computer. Indeed, this
description needs to be converted into a computer program written in a language of
given programming. We can consider such a computer program as another
way of specifying algorithms, although it is better to consider it as
the implementation of the algorithm.
Once an algorithm has been specified, you must show its correctness. That is to say, you
you must show that the algorithm produces the desired result for all legitimate inputs in
a finished time. For example, the correction of Euclid's algorithm for calculating the greatest
common divisors of two integers depend on the correctness of the equality
gcd(m,n) gcd(n, mmodn) (which in turn must be proven; see Problem 6 in the exercises),
the simple observation that the second number becomes smaller and smaller with each iteration of
the algorithm, and the fact that the algorithm stops when the second member is zero.
For some algorithms, a proof of correctness is fairly easy; for others, it can be
quite complex. A simple technique to demonstrate the correctness of an algorithm consists of
use mathematical induction because the iterations of an algorithm provide a sequence
natural steps necessary for such evidence. It may be interesting to note that while
that monitoring the performance of an algorithm for a few specific inputs can be a
very valuable activity, he cannot prove the correctness of the algorithm in a conclusive manner.
9
But in order for an algorithm to be incorrect, you only need one instance of it.
input for which the algorithm fails. If the algorithm is found to be incorrect, you must either the
rethink with the same decisions regarding data structures, the technique of
conception or in the extreme case reconsider one or more of these decisions.
The notion of correctness for approximation algorithms is less obvious than the
exact algorithms. For approximation algorithms, it is often preferred to be able to show
that the error produced by the algorithm does not exceed a predefined limit.
We usually want our algorithms to have several qualities. After the correction,
The quality that is by far the most important is efficiency. Indeed, we distinguish between two types of efficiency.
algorithms: time efficiency and memory efficiency. Time efficiency indicates with
what speed the algorithm executes. Memory efficiency indicates how much memory
the algorithm requests. A general framework and specific techniques for analyzing effectiveness
Algorithms are given in Chapter 2.
Another desirable characteristic is simplicity. Unlike efficiency, which can be
clearly defined and studied with mathematical rigor, simplicity, like beauty, is
finds to a considerable degree in the eyes of the owner. For example, several people
they will accept that the Euclidean algorithm is simpler than the elementary procedure of calculation
greatest common divisor, but it is unclear if the Euclidean algorithm is simpler than
the algorithm for testing consecutive integers. Moreover, simplicity is an important characteristic.
algorithms that need to be tried to obtain. Why? Because the simpler the algorithms
the easier it is to understand them, the easier it is to program them. Therefore, the
resulting programs generally contain few bugs. There is also the aspect
undeniable aesthetics of simplicity. Unfortunately, it is not always easy to know in
In what case a prudent compromise must be made.
Another desirable characteristic of an algorithm is generality. There are indeed two aspects.
here: the generality of the problem that the algorithm solves and the range of inputs it accepts. In
In the first case, let us note that it is sometimes easy to conceive an algorithm for a given problem.
des termes plus généraux. Considérons par exemple le problème de la détermination si deux entiers
are first among waters. It is easier to conceive an algorithm for a problem that is more
general calculation of the greatest common divisor of two integers and solve the posed problem in
checking whether the GCD is equal to one or not. However, there are situations where designing a
a more general algorithm is useless or difficult or even impossible. For example, it is useless to
to sort a list of numbers to find its median, which is its n/2th smallest element. For
give another example, the standard formula for the roots of a quadratic equation cannot
to be generalized to handle polynomials of arbitrary degree.
As with the range of appetizers, your main concern is to design an algorithm.
which deals with a range of inputs that is natural for the problem at hand. For example, exclude
the integers equal to 1 as possible inputs for the greatest common divisor algorithm will be
naturally not very natural. On the other hand, although the classic formula for the roots of an equation
Quadratic holds for complex coefficients, we normally should not implement it at
this degree of generality unless this capability is explicitly requested.
If you are not satisfied with the effectiveness of the algorithm, the simplicity or the generality, you must
go back and redesign the algorithm. In fact, even if your evaluation is positive, it is
toujours inutile de chercher d’autres solutions algorithmiques. Rappelons les trois algorithmes
different from the previous section for calculating the greatest common divisor; generally
you are not going to expect to have the best algorithm on the first try. In the best case,
10
you are going to try to refine an algorithm that you already have. For example, we have performed
several improvements to our implementation of the Sieve of Eratosthenes compared to the version
initial data given in Section 1.1. You would do better to keep the following observation in mind
Antoine de Saint-Exupéry, the French writer, pilot, and aircraft designer: 'A designer knows
that it has reached perfection not when there is nothing more to add, but when there is no longer
plus something to remove.
1.2.9 Coding of algorithms
Let us also note that throughout our course, we assume that the inputs of the algorithms
belong to the specified sets and therefore do not require verification. When
you will implement algorithms as programs to be used in
real applications, you will need to plan for such checks.
Exercises 1.2
Problem 1. Old World Puzzle. A shepherd is on one bank with a wolf, a goat, and
a cabbage head. He must help each of the three protagonists cross to the other side of the river.
means of a boat. The boat being small to transport them all, on each crossing of the
river, he can only take one of the three protagonists. One cannot leave the goat and the cabbage.
(the wolf and the goat) alone on a bank. How should the shepherd make them cross?
three protagonists under the indicated constraints. (Note: The shepherd is a vegetarian but does not like
not the cabbage and therefore cannot eat either the goat or the cabbage to help him solve the
problem. And it goes without saying that the wolf is a protected species).
Problem 2. New World Puzzle. There are four people who want to cross a bridge; they
They all start from the same side. You have 17 minutes to get them all across to the other side.
by the bridge. It is night and they have a flashlight. A maximum of two people can
cross the bridge at the same time. Each part that crosses, whether it's one or two people,
must have the flashlight with her. The flashlight must be brought in both directions; it must not
cannot be thrown away, for example. Person 1 takes 1 minute to cross the bridge, person 2
Person 1 takes 2 minutes, person 3 takes 5 minutes, and person 4 takes 10 minutes. A pair must
walk together at the pace of the slowest person. For example, if person 1 and the
Person 4 must cross first, 10 minutes will have passed when they reach the other side.
the bridge. If person 4 brings the flashlight, a total of 20 minutes will have passed and you
You have failed the mission.
Problem 3. Which of the following formulas can be considered as an algorithm?
for the calculation of the surface of a triangle whose side lengths are positive numbers a, b
and what?
a) S p ( p a)(p b)(p c ) , where p (a b c) / 2
1
b) S bcsinA , where A is the angle between sides b and c
2
11
There are a few types of problems that are encountered quite often in applications.
IT. We will review them in the next section. If the problem you
want to solve is among them, it will be possible for you to use one of the known algorithms
to solve it. Of course, it is good to understand how such an algorithm works and
know your strengths and weaknesses, especially if you have to choose between several algorithms
existing. But often, you will not find a directly usable algorithm to solve
your problem. At that moment, you should design yours by relying on it if possible
the existing algorithms and the many algorithm design techniques that we
we will study in the course that follows. The sequence of steps indicated in this section can help you.
in this exciting but not always easy task.
An input to an algorithm specifies an instance of the problem that the algorithm solves. It is very
It is important to specify exactly the range of instances that the algorithm must process. You
Failing to do so, your algorithm will be able to function correctly for the majority of inputs.
but fails on some boundary values. Remember that a correct algorithm is not one
who walks very often, but the one who walks correctly for legitimate entries.
You should not skip this first step of the algorithmic problem-solving process.
problems because if you do, you risk doing unnecessary work.
1.2.2 Check the capabilities of the computer resources
Once you have fully understood the problem, you need to check the
capabilities of the target computing system of your algorithm. The vast majority of algorithms
currently used are still intended to be programmed on machines very similar to the
von Neumann machine an architecture of machine proposed by the famous mathematician
Hungarian-American John von Neumann. The essence of this architecture is captured by what one
called random access memory (RAM). Its main hypothesis is that instructions are
executed one after the other, one operation at a time. Consequently, the algorithms designed for
executed on such machines are called sequential algorithms.
The main hypothesis of the RAM model does not hold for new computers that
can execute operations concurrently i.e., in parallel. The algorithms that rely on
These capabilities are called parallel algorithms. The study of design techniques and
The analysis of algorithms within the RAM model will remain a cornerstone for a long time.
angular of algorithmics.
Could you have regrets about the speed and memory capacity of a computer?
What do you have? If you design an algorithm as a scientific exercise, the answer
is an unqualified no. As you will see in Section 2.1, most scientists
Computer scientists prefer to study algorithms independently of the specification of parameters.
of a particular computer. If you conceive an algorithm as a practical tool, the answer
may depend on the problem you want to solve. Even the computers that we consider
as slow as today are almost unimaginably fast. Therefore, in most
In situations, you don't have to regret that a computer is so slow for the task. It exists.
however, very complex significant problems, dealing with large volumes of
data or dealing with applications where time is crucial. In such situations, it is imperative
to pay attention to the speed and available memory on a particular computer system.
1.2.3 Choose between an approximate or exact solution
The next decision is to choose between solving the problem exactly or solving it approximatively.
approximately. In the first case, an algorithm is called an exact algorithm, in the
In the last case, an algorithm is called an approximate algorithm. Why might one choose to use an
7
alphabet, strings of characters and larger records similar to those used
by the faculties regarding their students, the bookstores regarding their works, and the businesses
about their employees. In the case of recordings, we need to choose an item.
information to guide the sorting. For example, we can choose to sort the records of the
students in alphabetical order of names, by registration numbers or by overall grades
students. Such a specially chosen piece of information is called a key.
Why might we need a sorted list? Well, sorting makes several questions
regarding the easy-to-answer lists. The most important of these questions is research; it is
for this reason dictionaries, telephone directories, class lists and so on
are sorted. You will see other examples of the usefulness of sorted lists in Section 6.1. In the same
sorting is used as an auxiliary step in several important algorithms in
other fields, such as geometric algorithms.
At the moment, computer scientists have discovered dozens of different sorting algorithms.
Indeed, inventing a new sorting algorithm has been compared to the invention of the proverbial wheel. I am
however happy to reveal that the search for better sorting algorithms continues. This
Perseverance is admirable given the following facts. On one hand, there are very few good algorithms.
of sorting that sorts an arbitrary vector of size n with approximately nlogn2 comparisons. On the other hand,
no algorithm that sorts by key comparisons (as opposed to small comparisons)
key elements) cannot substantially do better than that.
There are reasons for this embarrassing algorithmic wealth in the field of sorting. Although
some algorithms are indeed better than others, there is no algorithm that will be
best solution among all solutions. Some algorithms are simple but relatively
slower while others are faster but more complex. Some perform better on a
randomly ordered entries while others perform better on almost sorted lists.
Some are suitable for lists residing in fast memory while others may
to be suitable for sorting large files stored on disk and so on.
Two properties of sorting algorithms deserve special mention. A sorting algorithm is
it preserves the relative order of two equal elements from the input list in the sorted list.
In other words, if a list contains two equal elements in the positions... so then
in the sorted list they must be found respectively in positions i' and j' such that i' j’.
This property can be desirable if, for example, we have a list of students sorted in order
alphabetical and we want to sort them according to the student’s GPA: a stable algorithm will produce a
List in which students with the same GPA will be sorted alphabetically.
Generally speaking, algorithms that can exchange remote keys are not stable.
but are usually quicker.
The second important property of a sorting algorithm is the amount of additional memory.
that the algorithm requires. A sorting algorithm is said to be internal if it does not require memory.
additional in addition to that occupied by the waiting list, possibly except for a few
memory units. There are important sorting algorithms that are internal and others that are not.
are not.
13
particular importance for real applications because they are essential for the
storage and retrieval of information in databases.
Pour la recherche aussi, il n’y a pas un seul algorithme qui s’adapte mieux à toutes lessituations.
Some algorithms run faster than others but require more memory, some
are faster but applicable only to sorted vectors, etc. Unlike the
sorting algorithms, there is no stability issue, but different topics emerge.
Specifically, in applications where the underlying data may change frequently
in relation to the number of searches, the research must be considered in conjunction with two
other operations: insertion and deletion of an element in the data set. In such
situations, the data structures and algorithms will need to be chosen to reduce the balance
between the requirements of each operation. Also, the organization of large datasets for
An effective search presents particular challenges with significant implications for
real applications.
Some problems on graphs are very difficult. The most well-known examples are the
traveling salesman problem and the graph coloring problem. The problem of
The traveling salesman problem is the issue of finding the shortest path that passes through n.
cities by visiting each city exactly once. In addition to the obvious applications
14
regarding road planning, we are recording modern applications such as manufacturing
VLSI chips, X-ray crystallography, and genetic engineering. The coloring problem.
Graph coloring is the problem of assigning the smallest number of colors to the vertices.
of a graph such that two adjacent vertices are not the same color. This problem arises
in several applications such as event scheduling: if the events are
represented by vertices that are connected by an arc if and only if the events
correspondents cannot be scheduled at the same time, a solution to the problem of
Graph coloring can produce optimal scheduling.
15
Mathematical problems can only be solved approximately. Another main
The difficulty lies in the fact that such problems typically require the manipulation of numbers.
real numbers, which can only be represented in the machine approximately. Moreover, a
large number of arithmetic operations performed on represented numbers
approximately can lead to a buildup of rounding errors to a point where they
can drastically affect an output produced by an apparently fair algorithm.
Several sophisticated algorithms have been developed over the years in this field, and they
continuent de jouer un rôle critique dans plusieurs applications scientifiques et d’ingénierie. Maisau
Over the past 25 years, the computer industry has shifted its focus to the field
management applications. These new applications primarily require algorithms.
for the storage, retrieval, and transmission of information across networks, and their
presentation to users. As a consequence of this revolutionary change, the analysis
digital has lost its former dominant position both in the industry and in programs
computing. However, it is always important for any beginner in computing to have at
less a rudimentary idea about scientific algorithms.
Exercises 1.3
Problem 1. Consider the sorting algorithm that sorts a vector by counting for each of its
elements, the number of elements that are smaller than it and then uses this information to
place the element in its final position in the sorted vector:
a) Apply this algorithm to sort the list 60, 35, 81, 98, 14, 47.
b) Is this algorithm stable?
Is it internal?
Problem 2. Name the search algorithms that you already know. Give a
succinct description of each of these algorithms in French. (If you do not know of such
algorithms, take the opportunity to design one)
Problem 4. The bridges of Königsberg. The puzzle of the bridges of Königsberg is universally
accepted as a problem that gave rise to graph theory. It was solved by the great
Swiss mathematician Leonard Euler (1707-1783). The problem asked whether one could,
16
in a single walk, cross each of the seven bridges of the city of Königsberg exactly once
times and return to the starting point. Below is a sketch of the river with its islands and its seven bridges.
Find a Hamiltonian circuit, a path that visits all the vertices of the graph exactly once.
times before returning to the starting point, for this graph.
Problem 6. We consider the following map:
b
a
c d
e f
a) Explain how you can use the graph coloring problem to color.
the map so that two adjacent regions are not colored the same color.
b) Use the answer from question (a) to color the map with the smallest number of
colors.
Problem 7. Design an algorithm for the following problem: given a set of n
Points in a Cartesian plane, determine if all these points are located on the same circumference.
1.4 The fundamental data structures
As the majority of interest algorithms operate on data, special means
Organizing data plays a critical role in the design and analysis of algorithms.
17
Data structure can be defined as a particular way of organizing items of
related data. The nature of the data items is dictated by the problem at hand; they
can range from elementary data types to data structures. There is very little
data structures that have proven to be particularly interesting for algorithms
informatics. As you are probably familiar with all or almost all of these data structures,
A quick review is provided here.
The most important linear data structures are vectors and linked lists. A
A vector is a sequence of elements of the same type of data that are stored in boxes.
consecutive to the computer's memory and made accessible by specifying their value
index in the vector (Figure 1.3).
In most cases, the index is an integer between 0 and n-1 or between 1 and n. Some
programming languages allow an index that can vary between two e
The vector and the linked list are two main choices for representing a data structure more
abstract called linear list or simple list. A list is a finite sequence of data elements,
i.e., a set of data elements arranged in a certain order. The basic operations
performed on this data structure are searching, inserting, and deleting a
element.
Two special types of lists, stacks and queues, are particularly important.
DEFINITION.
A queue is a list in which all insertions are made at the end of the list and all
suppressions at the top of the list. Therefore, a queue operates according to the "first-
first-come-first-served" or "first-in-first-out" (FIFO). The queues
also have important applications including several algorithms for the problems of
graphs.
19
largest element and the addition of a new element. Of course, a priority queue must be
implemented in such a way that these last two operations produce another priority queue.
A direct implementation of this structure can be based on a vector or on a vector.
ordered, but none of these options produces the most efficient solution. A better
implementation of a priority queue is based on an ingenious data structure called the
tas(heap).
1.4.2 Graphs
DEFINITION. A graph is a pair G = (V, E) where V is a non-empty finite set of objects.
called vertices or nodes and E a set of unordered pairs of vertices called edges. These
vertex pairs are not ordered, meaning that the pair of vertices (u, v) is identical to
The pair (v, u) means that the vertices u and v are adjacent and they are connected by an undirected arc (u,
v). The vertices u and v are called the endpoints of the arc (u, v) and it is said that u and v are incident to
cet arc; on dit également que l’arc (u, v) est incident à ses extrémités.
If a pair of vertices (u, u) is not equivalent to the pair (v, u), we say that the arc (u, v) is directed to
starting from the vertex called tail to the vertex called head. It is also said that the arc (u, v) exits from the
summit and enters the summit. A graph where all edges are directed is called a directed graph.
oriented. Directed graphs are also called digraphs.
It is convenient to label the vertices of a graph or a directed graph with letters or integers.
or if the application recommends it, strings (figure 1.6). The graph in figure
1.6a six vertices and seven arcs:
V a,b,c,d,e,f , E (a,c), (a,d), (b,c), (b,f), (c,e), (d,e), (e,f) .
The digraph of figure 1.6b has six vertices and eight arcs.
V a,b,c,d,e,f , E (a,c), (b,c), (b,f), (c,e), (d,a), (d,e), (e,c), (e,f) .
a c b a c b
d e f d e f
Our definition of a graph does not prohibit loops, or edges connecting vertices to themselves.
Unless stated otherwise, we will consider graphs without loops. As per our definition
forbids multiple edges between the same vertices of a large undirected graph, we have the inequality
next for the number of possible arcs E in an undirected graph having V peaks and not
without loops:
0 E V V 1 / 2.
A graph in which every vertex is connected to every other vertex by an edge is said to be complete.
A standard notation to denote a complete graph of V summits is K V . A graph in
which only a few arcs are missing is said to be dense. An arc having very few arcs compared to the
the number of its vertices is said to be sparse. The fact that we are working with a dense or sparse graph
20
can influence the way it is represented, and consequently the execution time of the algorithm in
conception or used.
Graph representation. Graphs for computer algorithms can be
represented in two main ways: the adjacency matrix and the adjacency lists. The matrix
The adjacency of a graph with n vertices is a binary matrix of order n having one row and one
column for the graph vertex, in which A[i,j] 1 if there is an arc between the vertices i and j,
andA[i,j] 0 otherwise. For example, the adjacency matrix of the graph in figure 1.6a is given at the
Figure 1.7a. It should be noted that the adjacency matrix of an undirected graph is always symmetric,
that is to say thatA[i,j]
(why?)A[j,i], 0 i,j n 1
The adjacency lists of a graph or a directed graph consist of a collection of linked lists,
one for each vertex, containing all the adjacent vertices to the vertex in the list (i.e. all the
vertices connected to it by an edge). Usually, such lists start with a sentinel
identifier a summit for which the list is compiled. For example, Figure 1.7b represents the
graph of Figure 1.6a using its adjacency lists. In other words, the adjacency lists
indicating the columns of the adjacency matrix which, for a given vertex, contain ones.
a b c d e f a c d
a 0 0 1 1 0 0
b c f
b 0 0 1 0 0 1
c a b e
c 1 1 0 0 1 0
d 1 0 0 0 1 0
d a e
e 0 0 1 1 0 1 e c d f
f 0 1 0 0 1 0 f b e
(a) (b)
FIGURE 1.7– (a) Adjacency matrix and (b) adjacency lists of the graph in figure 1.6a
If the graph is sparse, the representation in the form of adjacency lists may use less
of space that the representation in the form of an adjacency matrix despite memory
additional used by the pointers of linked lists; the situation is exactly the opposite
for dense graphs. In general, the choice of the most convenient representation depends on the
nature of the problem, the algorithm used to solve it, and possibly the type of graph
entry (dense or hollow).
Weighted graphs. A graph (or digraph) is a graph (or digraph) in which each arc is
affected by numerical weights. These numbers are called weights or costs. An interest in
These graphs are motivated by numerous real-world applications, such as finding the shortest path.
between two points in a transport or telecommunications network or the traveling salesman problem
of commerce mentioned above.
a b c d
a 5 b a 5 1 a b,5 c,1
b 5 7 0 b a,5 c,7 d,4
1 4
7 c 1 7 2 c a,1 b,7 d,2
c d d 4 2
d b,4 c,2
2
(a) (b) (c)
21
FIGURE 1.8– (a) Graphe pondéré. (b) Sa matrice d’adjacence. (c) Ses listes d’adjacence.
The two main representations of graphs can be easily adapted for
correspond to weighted graphs. If a weighted graph is represented in matrix form
of adjacency, then the elementA[i,j] will simply contain the weight of the arc connecting the summit to the
such an arc exists and is a special symbol, for example otherwise. Such a matrix is
called a weighted matrix or a cost matrix. The adjacency lists of a weighted graph
must include in their nodes not only the names of the adjacent nodes but also the weight
of the corresponding arch.
Paths and cycles. Among the interesting properties of graphs, two are important for a
a large number of applications: connectivity and cyclicity. Both are based on the notion of
A path from one vertex to another in a graph G can be defined as the sequence of
all adjacent vertices (connected by an arc) starting with u and ending with v. if all the
The vertices of a path are distinct, the path is called simple. The length of a path is the
total number of vertices contained in the sequence defining the shortest path, which is identical to
number of arcs contained in the path.
A directed path is a sequence of vertices in which each pair of vertices is connected by
a directed arc connecting the vertex listed first to the vertex listed second.
A graph is said to be connected if there exists a path going from u to v for every pair of vertices u and v.
Informally, this property means that if we create a model of a connected graph in
connecting spheres representing the sums of the graph with chains representing the arcs, we
will have a part. If a graph is not connected, such a model will consist of several parts.
connected components of the graph. Formally, a connected component is
a maximal (non-extendable by the inclusion of an extra vertex) subgraph of a graph.
For example, the graphs of Figures 1.6a and 1.8b are connected, while the graph of Figure 1.9 has two
connected components with the vertices {a, b, c, d, e} and {f, g, h, i}, respectively.
a f
b c e g h
a i
22
1.4.3 The trees
A tree (or more precisely a free tree) is a connected acyclic graph (Figure 1.10a). A
A graph that has no cycle but is not necessarily connected is called a forest.
1.10b).
Trees have several important properties that graphs do not have. In particular, the number
The number of edges in a tree is always less than the number of its vertices:
E V 1.
As shown in the graph of figure 1.9, this property is necessary but not sufficient for
that a graph is a tree. However, for connected graphs it is sufficient and thus provides a
A convenient way to determine if a connected graph has a cycle.
a b a b h
c d c d e i
f g f g j
(a) (b)
{"a":"A tree.","b":"A forest."}
Trees with roots. Another important property of trees is the fact that for two nodes in
A tree has exactly one simple path from one of its nodes to another.
property allows the choice of an arbitrary vertex in a tree and considers it as the
root. A tree with a root is always represented by placing its root at the top (level 0 of
the tree), the vertices adjacent to the root below (level 1), the vertices two arcs further from
the root below this (level 2), and so on. Figure 1.11 presents such a
transformation of a free tree into a rooted tree.
Trees with roots play an important role in computer science, a role more important than that
free trees. Indeed, to be brief, they are often considered as simple trees.
obvious applications of trees are for describing hierarchies, starting from files of
dictionaries to company organizational charts. There are many obvious applications; such as
that the implementation of dictionaries, the efficient storage of large data sets and the
data encoding. Trees are also useful for analyzing recursive algorithms. Finally
this largely incomplete list of the applications of trees, we could mention the trees
state-space (state-space trees) that underlie two important design techniques
algorithms: backtracking and branch-and-bound.
For each vertex of a tree T, all the vertices on the simple path going from the root to this one.
peaks are called ancestors of. The peak itself is often considered as its
proper ancestor; the set of ancestors excluding the top itself are considered as
23
clean ancestors. If (u, v) is the last edge of the simple path from the root to a vertex v
(etu v), you are called theparent nodes have the same parent are
called cousins (siblings). A vertex without edges is called a leaf; a vertex with
less a son is called parental. All the vertices for which a vertex is an ancestor are
called descendants dev; the own descendants exclude the summit itself. All the
descendants of a vertex having all the arcs connecting them to a subtree of T take their root at
this vertex. Thus, for the tree in Figure 1.11b, the root of the tree is; the vertices d, g, f, h
are leaves, while the peaks are parental, the peaks of the subtree of
racinebsont {b,c,g,h,i}
The depth of a tree is the length of the simple path from the root to v. The height
The height of a tree is the length of the longest simple path from the root to a leaf. Thus, if
we count the levels of a tree starting with 0 for the root level, the
The depth of a node is simply its level in the tree, and the height of the tree is the
maximum level of its peaks.
i d a
c b a e b d e
h g c g f
f
h I
(a) (b)
{"a":"Free tree.","b":"Its transformation into a rooted tree."}
Arbres ordonnés. Unarbre ordonnéest un arbre avec racine dans lequel tous les fils de chaque
nodes are ordered. It is convenient to assume that in the diagram of a tree, all the branches
are arranged from left to right. A binary tree can be defined as a tree
ordered in which each vertex has no more than two children and each child is designated either as
a left child should be like a right child of its parent. The subtree having its root at the left child
(right) of a vertex is called the left (right) sub-tree of that vertex. An example of a tree
binary is given in figure 1.12a.
In figure 1.12b, numbers are assigned to the vertices of the binary tree in Figure 1.12a.
Note that a number assigned to each parent node is greater than all the numbers in
its left subtree and smaller than all the numbers in its right subtree. Such trees
are called binary search trees. Binary trees and binary search trees
recherche ont une grande variété d’applications en informatique. En particulier, les arbres binaires
Research can be generalized to more general types of search trees called trees.
multimode research, which is essential for the efficient storage of very large files
on disk.
As we will see later, the efficiency of the most effective algorithms for trees
search binaries and their extensions depend on the height of the tree. Therefore, the
24
The following inequalities for the height of a binary tree with n nodes are special
important for the analysis of such algorithms:
logn 2 h n 1.
5 12
1 7 10
5 12 nil
nil 4 nil
25
FIGURE 1.13 - Standard implementation of the binary search tree in Figure 1.12b
It is not difficult to see that this representation effectively transforms an ordered tree into
a binary tree is said to be associated with the ordered tree. This representation is obtained by rotating the
45° pointers clockwise (Figure 1.14b).
a nil
b nil d e nil
(a)
c d
h g e
i f
(b)
FIGURE 1.14–(a) Representation first child-next cousin of the graph in Figure 1.11b. (b)
Its representation in the form of a binary tree.
Exercises 1.4
Problem 1. Describe how we can implement each of the following operations on a
vector such that the time it takes does not depend on the size of the vector.
a) Remove the i-th element from a vector 1 i n )
b) Remove the i-th element from a sorted vector (the remaining vector must of course remain sorted)
sorted)
27
Problem 2. If you solve the search problem in a list of numbers, how
pouvez-vous prendre avantage du fait que la liste est connu être trié ? Donnez des réponses séparées
for :
a) lists represented in vector form.
b) the lists represented as a linked list.
Problem 3. a) Show the content of the stack after each operation in the following sequence
starting with the empty stack:
push(a), push(b), pop, push(c), push(d), pop
b) Show the contents of the queue after each operation of the following sequence in
starting with the empty queue:
enqueue(a), enqueue(b), dequeue, enqueue(c), enqueue(d), dequeue
Problem 4.a) Let A be the adjacency matrix of an undirected graph. Explain what property
from the matrix indicates that:
i. The graph is complete
ii. The graph has a loop, that is, an edge connecting a vertex to itself.
iii. The graph has an isolated vertex, that is to say a vertex that has no incident edge.
b) Answer the same questions for the representation in the form of adjacency lists.
Problem 5. Provide a complete description of an algorithm that transforms a free tree into a
tree whose root is located at a given summit of the free tree.
Problem 6. Indicate how the abstract data type priority queue can be implemented.
as
an (unsorted) vector
b) one sorted vector
c) a binary search tree.
28