Machine Learning Fundamentals

Understanding AI, ML, and Deep Learning

Dr. Dhaval Patel • 2025

What We'll explore

Concepts

The relationship between Artificial Intelligence, Machine Learning, and Deep Learning
Tom Mitchell's formal definition that gives ML its mathematical foundation
The essential components every learning algorithm must have
Four distinct types of machine learning and when to use each
Real-world applications that surround us daily
Key terminology that will help you speak the language of ML

Foundation Concepts

AI, Machine Learning, and Deep Learning

Understanding the Relationship

Venn diagram showing AI as the largest circle, ML as a subset within AI, and Deep Learning as a subset within ML

Artificial Intelligence: The Broader Vision

Artificial Intelligence represents humanity's ambitious goal of creating machines that can "think." The field began in the 1950s with a simple but profound question: Can machines exhibit intelligent behavior?

                    Modern Definition: "The study and design of intelligent agents, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success."
                

Here's what makes this interesting: AI doesn't necessarily require learning. For decades, the dominant approach was symbolic AI, where programmers explicitly wrote rules. Think of early chess programs that followed thousands of hand-coded strategies like "control the center" or "protect your king."

This approach worked brilliantly for logical, rule-based problems like chess, but struggled with fuzzy, real-world challenges.

How do you write explicit rules for recognizing your grandmother's face in a photo, or understanding sarcasm in speech? These limitations led us to seek a different approach.

Question: Think of tasks that would be easy for a human but nearly impossible to program with explicit rules?

Machine Learning: A Paradigm Shift

Machine Learning emerged as a revolutionary response to symbolic AI's limitations. Instead of programming explicit rules, we realized we could let machines discover patterns themselves.

                    Core Insight: Rather than telling computers HOW to solve problems, we show them examples of solved problems and let them figure out the patterns.
                

This represents a fundamental shift in thinking. Traditional programming follows this logic:

Data + Program → Output

But machine learning flips this equation:

Data + Desired Output → Program (Model)

This "program" we create is called a model, and it captures the learned patterns from our examples. Once trained, we can apply this model to new, unseen data to make predictions or decisions.

Think of it like teaching a child to recognize animals. Instead of describing every possible feature of a cat, you show them hundreds of pictures labeled "cat" and "not cat." Eventually, they learn to recognize cats they've never seen before. Machine learning works similarly, but with mathematical precision.

Deep Learning: The Neural Network Revolution

Deep Learning represents the most recent breakthrough in machine learning, inspired by how our brains process information through interconnected neurons.

                    Definition: A specialized approach within machine learning that uses multi-layered neural networks to automatically discover complex patterns in vast amounts of data.
                

What makes deep learning "deep" is the multiple layers of artificial neurons, each learning different levels of abstraction.

The first layer might detect edges in an image, the second layer combines edges into shapes, the third layer combines shapes into objects, and so on.

Three key factors enabled deep learning's recent success:

Massive Datasets: The internet provided billions of labeled examples
Powerful Hardware: Graphics processors (GPUs) could handle the intense calculations
Algorithmic Improvements: Better techniques for training very deep networks

Deep learning has achieved remarkable success in areas like image recognition, language translation, and game playing, often surpassing human performance on specific tasks.

The Mathematics of Learning

Tom Mitchell's Formal Definition

A Precise Framework for Learning

"A computer program is said to learn from
Experience E with respect to some
Class of Tasks T and
Performance Measure P
if its performance at tasks in T, as measured by P,
improves with experience E."

Why This Definition Matters: Tom Mitchell's definition gives us a mathematical framework to evaluate whether a system is truly "learning." Every machine learning algorithm must clearly specify these three components, making the field more rigorous and measurable.

Breaking Down the Learning Components

Task (T) & Experience (E)

Task (T): The specific behavior or problem we want the system to improve at. This could be making predictions, classifying images, or choosing optimal actions.

Experience (E): The data or information the system uses to learn. This is your training material - the examples from which patterns will be discovered.

                    Example: If T is "identify cats in photos," then E might be "10,000 labeled photos of cats and non-cats."
                

Performance Measure (P)

Performance (P): How we quantify success. This must be measurable and objective, allowing us to track improvement over time.

Different tasks require different performance measures:

Classification: Accuracy percentage
Regression: Mean squared error
Games: Win rate or score

Without a clear performance measure, we can't determine if learning is actually occurring!

Applying the Framework: A Checkers Example

Let's make this concrete with a classic example: teaching a computer to play checkers better.

                    Task (T): Playing checkers - specifically, choosing the best move given any board position.
                

                    Experience (E): Thousands of checkers games, including board positions and the outcomes of moves made from those positions.
                

Performance (P): Percentage of games won against a variety of opponents.

Notice how this framework forces us to be specific. We can't just say "learn to play checkers better." We must define exactly what constitutes the task, what data we'll use for learning, and how we'll measure improvement.

Building a Learning System

The Four Essential Steps

From Concept to Implementation

Creating any machine learning system requires making four crucial decisions. Think of these as the architectural choices that determine your system's capabilities and limitations.

                    Step 1: Choose the Training Experience

                    What data will teach your system? This choice impacts everything else. The data should be representative, relevant, and expressed in useful features.

                    Step 2: Choose the Target Function

                    What exactly are you trying to learn? This is the ideal function that maps inputs to desired outputs - your "perfect solution."

                    Step 3: Choose the Function Representation

                    How will you approximate the target function? This is your hypothesis space - the class of functions your algorithm can consider.

                    Step 4: Choose the Learning Algorithm

                    How will you search through possible functions to find the best one? This algorithm explores your hypothesis space to find the function that best fits your training data.

Designing a Learning System: Email Spam Detection

Flowchart showing email features (sender, subject, content) flowing through a learning algorithm to produce spam/not spam classification

Applying the Four Steps

Step 1 - Training Experience: 100,000 emails labeled as "spam" or "legitimate"

Step 2 - Target Function: A function that maps email features to spam probability

Step 3 - Representation: A mathematical model using word frequencies, sender patterns, and subject line characteristics

Step 4 - Algorithm: A process that adjusts the model's parameters to minimize classification errors on training data

Diagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learningDiagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learningDiagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learningDiagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learningDiagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learningDiagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learning

Types of Machine Learning

Four Distinct Approaches

Supervised vs Unsupervised Learning

Supervised Learning

Learning from examples with correct answers. Like studying for an exam with an answer key - you know what the right responses should be.

Has labeled training data (input-output pairs)

Goal: Learn to predict outputs for new inputs

Unsupervised Learning

Finding hidden patterns without knowing the "right" answers. Like exploring a new city without a map - you discover the structure as you go.

Has only input data (no labels or target outputs)

Goal: Discover hidden structures or patterns in data

VS

Supervised Learning: Learning from Examples

Classification Tasks

When your output is a category or discrete class. Think of it as sorting things into labeled boxes.

                    Examples:

                    • Email: Spam or Not Spam

                    • Medical: Disease or Healthy

                    • Finance: High Risk or Low Risk

The key insight: you're predicting which category something belongs to. The output has distinct, separate values with no meaningful order between them.

Regression Tasks

When your output is a continuous number. Think of it as predicting a position on a measuring stick.

Examples:
• House prices in dollars
• Patient survival time in years
• Stock prices or temperature

The key insight: you're predicting a quantity that can take any value within a range. Small changes in input typically lead to small changes in output.

How Supervised Learning Works

Training Phase:
Input: Features + Labels → Algorithm learns patterns

Prediction Phase:
Input: New Features → Trained Model → Predicted Label

The Learning Process: During training, the algorithm examines thousands of examples, gradually adjusting its internal parameters to minimize the difference between its predictions and the true labels. This process is like a student learning from practice problems with answer keys, slowly improving their ability to solve similar problems correctly.

Unsupervised Learning: Discovering Hidden Patterns

Clustering

Finding natural groupings in your data. Imagine having a box of mixed objects and sorting them into piles based on similarity, without knowing ahead of time what groups should exist.

                    Example: Customer Segmentation

                    • Group customers by buying behavior

                    • Discover market segments

                    • Tailor marketing strategies

The algorithm discovers that some customers buy luxury items, others focus on discounts, and others prefer eco-friendly products - groups you might not have thought to look for.

Association Rules

Finding relationships between different items or events. Discovering "if this, then that" patterns in your data.

Example: Market Basket Analysis
• "People who buy bread also buy butter"
• "Movie fans who like sci-fi also enjoy fantasy"
• "Patients with symptom A often have symptom B"

These discoveries help with recommendations, inventory management, and understanding complex relationships in your domain.

Reinforcement Learning: Learning Through Interaction

Reinforcement Learning is fundamentally different from both supervised and unsupervised learning. Instead of learning from a fixed dataset, an agent learns by taking actions in an environment and receiving feedback.

                    Key Concept: An agent learns optimal behavior through trial and error, receiving rewards for good actions and penalties for poor ones.
                

Think of teaching a child to ride a bicycle. You don't give them a manual or show them thousands of examples. Instead, they try different actions (pedaling, steering, balancing), experience the consequences (staying upright or falling), and gradually learn what works.

Agent → Action → Environment → Reward/Penalty → Learning

The agent's goal is to learn a policy - a strategy that tells it what action to take in any situation to maximize total rewards over time. This approach has achieved remarkable success in complex scenarios like game playing, robotics, and autonomous systems.

Real-world Applications: Self-driving cars learning to navigate safely, AI systems mastering complex games like Go and poker, and robots learning to walk or manipulate objects.

Semi-Supervised Learning: The Best of Both Worlds

Diagram showing a small amount of labeled data mixed with a large amount of unlabeled data being used together for learning

Bridging the Gap

Semi-supervised learning addresses a common real-world problem: you have lots of data, but only a small portion is labeled.

                    Why This Matters:

                    • Labeling data is expensive

                    • Requires human experts

                    • Time-consuming process

                    • Unlabeled data is abundant

Core Assumption: If two data points are close together in a high-density region, they likely have similar labels.

Strategy: Use unsupervised learning to understand data structure, then use supervised learning to propagate labels through similar regions.

Applications & Impact

Machine Learning in the Real World

Machine Learning Surrounds Us

Machine learning has quietly revolutionized nearly every aspect of modern life. Let's explore how these algorithms work behind the scenes to power the applications you interact with daily.

                    Medicine & Healthcare

                    Diagnostic systems analyze symptoms, lab results, and genetic data to identify diseases earlier and more accurately than ever before. Treatment recommendation systems examine vast medical databases to suggest optimal therapies for individual patients.

                    Computer Vision

                    Your smartphone recognizes faces in photos, autonomous vehicles identify pedestrians and traffic signs, and medical imaging systems detect cancers that human radiologists might miss.

                    Natural Language Processing

                    Translation systems break down language barriers, voice assistants understand your commands, and sentiment analysis helps companies understand customer feedback at scale.

                    Business Intelligence

                    Recommendation engines suggest products you might like, fraud detection systems protect your financial transactions, and demand forecasting optimizes supply chains globally.

Essential Terminology

Language of Machine Learning

ML Key Terminology

Data & Representation

                    Features: The measurable properties used to describe each data point. Like the ingredients list for a recipe - the essential characteristics that define each instance.
                

                    Feature Vector: A mathematical representation where each data point becomes a list of numbers. Like converting a house description into [bedrooms: 3, bathrooms: 2, square_feet: 2000].
                

                    Instance Space (X): The set of all possible objects you could describe with your chosen features. Your universe of possible inputs.
                

Learning Components

Example (x, y): A specific instance paired with its correct answer.

Target Function (f): The perfect function we're trying to learn - it always gives the correct output for any input. Usually unknown and impossible to compute directly.

Hypothesis: Our learned model - our best approximation of the target function based on training data. This is what we actually use to make predictions.

Putting It All Together: The Complete Learning Pipeline

Training Phase:
Raw Data → Feature Extraction → Learning Algorithm → Model (Hypothesis)

Prediction Phase:
New Raw Data → Feature Extraction → Trained Model → Prediction

Conclusion: Machine learning transforms raw, messy real-world data into actionable predictions through a systematic process. First, we extract meaningful features from our data. Then, during training, our algorithm examines thousands of examples to discover patterns and relationships. Finally, when we encounter new data, we extract the same features and apply our learned model to make predictions. This pipeline represents the fundamental pattern underlying all machine learning applications.