Introduction to Regression Analysis and Gradient Descent

Supervised Learning:

Probably the most common problem type in machine learning is Supervised Learning.

Example 1:
Let's start with an example: imagine we wanted to predict housing prices, how would we approach this problem?

  1. We could collect data regarding housing prices and how they relate to size in feet.

Given this data, a friend has a house 750 square feet - how much can they be expected to get?

What approaches can we use to solve this?

  • Straight line through data
  • Second order polynomial
  • How to chose straight or curved line ?

Each of these approaches represent a way of doing supervised learning

What does this really mean? We gave the algorithm a data set where a "right answer" was provided So we know actual prices for houses.
The idea is we can learn what makes the price a certain value from the training data.
The algorithm should then produce more right answers based on new training data where we don't know the price upfront.

In other words we want to predict The house Prices for new houses with no price at our disposal.

We call this Regression Problem

Regression Probelm is used where:

  • We need to predict continuous Values output in this case price.
  • There is no real discrete delineation.

Example 2:
Can we definer breast cancer as malignant or benign based on tumour size ?

We can see there are 5 of each point and the output can only be 1(Yes) or 0(No).

This is an example of classification problem.

What is classification?

  • Classify data into one of two discrete classes - no in between, either malignant or not.
  • In classification problems, can have a discrete number of possible values for the output.
  • we are not confined to two outputs, e.g. maybe have four values:
    1. benign
    2. type 1
    3. type 2
    4. type 4
  • In classification we can plot the data in a different way Using only one attribute size,:

  • In other problems may have multiple attributes We may also, for example, know age and tumor size.

Based on that data, you can try and define separate classes by:

  • Drawing a straight line between the two groups
  • Using a more complex function to define the two groups
  • Then, when you have an individual with a specific tumor size and who is a specific age, you can hopefully use that information to place them into one of your classes

You might have many features to consider:

  • Clump thickness
  • Uniformity of cell size
  • Uniformity of cell shape
The most exciting algorithms can deal with an infinite number of features

Summary:

Supervised learning lets you get the "right" data a
Regression problem
Classification problem



Unsupervised Learning:

This is the second major problem type, in an Unsupervised Learning we get unlabled Data and the task is to structure it.
One way of doing this would be to Cluster data into groups.

This is a Clustering Problem

Clustering Algorithm:

Examples:

  • Google News
    • Groups News stories into cohesive groups
  • Genomics
  • Microarray data
    • Where you have a group of individuals
    • On each measure expression of a gene
    • Now you can run algorithm to cluster individuals into types of people

  • Organize Computer clusters
    • Identify potential weak spots or distribute workload effectively
  • Social Network Analysis
    • Clustering Customer daa
  • Astronomical data analysis
    • Algorithms give amazing results