What is Machine Learning?
The area of Machine Learning deals with the design of programs that can learn rules
from
data, adapt to changes, and improve performance with experience. In addition to
being
one of the initial dreams of Computer Science, Machine Learning has become crucial
as
computers are expected to solve increasingly complex problems and become more
integrated
into our daily lives.
Writing a computer program is a bit like writing down instructions for an extremely
literal child who just happens to be millions of times faster than you. Yet many of
the
problems we now want computers to solve are no longer tasks we know how to
explicitly
tell a computer how to do. These include identifying faces in images, autonomous
driving in
the desert, finding relevant documents in a database (or throwing out irrelevant
ones, such
as spam email), finding patterns in large volumes of scientific data, and adjusting
internal
parameters of systems to optimize performance. That is, we may ourselves be good at
identifying people in photographs, but we do not know how to directly tell a
computer how
to do it. Instead, methods that take labeled training data (images labeled by who
is in them,
or email messages labeled by whether or not they are spam) and then learn
appropriate rules
from the data, seem to be the best approaches to solving these problems.
Furthermore, we
need systems that can adapt to changing conditions, that can be user-friendly by
adapting
to needs of their individual users, and that can improve performance over time.
What is Machine Learning Theory?
Machine Learning Theory, also known as Computational Learning Theory, aims to
understand the fundamental principles of learning as a computational process. This
field seeks
to understand at a precise mathematical level what capabilities and information are
fundamentally needed to learn different kinds of tasks successfully, and to
understand the basic
algorithmic principles involved in getting computers to learn from data and to
improve performance with feedback. The goals of this theory are both to aid in the
design of better
automated learning methods and to understand fundamental issues in the learning
process
itself.
Machine Learning Theory draws elements from both the Theory of Computation and
Statistics and involves tasks such as:
• Creating mathematical models that capture key aspects of machine learning, in
which
one can analyze the inherent ease or difficulty of different types of learning
problems.
• Proving guarantees for algorithms (under what conditions will they succeed, how
much
data and computation time is needed) and developing machine learning algorithms
that
provably meet desired criteria.
• Mathematically analyzing general issues, such as: “Why is Occam’s Razor a good
idea?”, “When can one be confident about predictions made from limited data?”, “How
much power does active participation add over passive observation for learning?”,
and
“What kinds of methods can learn even in the presence of large quantities of
distracting
information?”.
A Few Highlights
Consider the general principle of “Occam’s razor”, that simple explanations should
be preferred to complex ones. There are certainly many reasons to prefer simpler
explanations —
for instance, they are easier to understand — but can one mathematically argue for
some
form Occam’s razor from the perspective of performance? In particular, should
computer
programs that learn from experience use some notion of the Occam’s razor principle,
and
how should they measure simplicity in the first place?
One of the earliest results in Computational Learning Theory is that there is
indeed a
reason as a policy to seek out simple explanations when designing prediction rules.
In particular, for measures of simplicity including description length in bits,
Vapnik-Chervonenkis
dimension which measures the effective number of parameters, and newer measures
being
studied in current research, one can convert the level of simplicity into a degree
of confidence in future performance. While some of these theoretical results are
quite intricate, at a
high level the intuition is just the following: there are many more complicated
explanations
possible than simple ones. Therefore, if a simple explanation happens to fit your
data, it
is much less likely this is happening just by chance. On the other hand, there are
so many
complicated explanations possible that even a large amount of data is unlikely to
rule all
of them out, and even some that have nothing to do with the task at hand are likely
to