Establish A Baselines


When starting a machine learning project, it is essential to establish a baseline level of performance before making improvements. This baseline serves as a point of comparison and helps determine where to focus efforts.

Establishing a Baseline

  • Determine major categories in your data. for speech recognition: (e.g., clear speech, speech with car noise, speech with people noise, low bandwidth audio).
  • Measure accuracy for each category (e.g., 94%, 89%, 87%, 70%).
  • Avoid prematurely focusing on the category with the lowest accuracy.
  • Human-Level Performance:
    • Label the data and measure human-level performance for all categories.
    • Compare human-level performance (HLP) to identify areas with potential for improvement.

Baseline for Unstructured Data

  • Unstructured data includes images, audio, and natural language.
  • Humans are good at interpreting unstructured data.
  • Human-level performance (HLP) is a useful baseline for unstructured data projects.
  • Measure the performance of humans in the given task to establish a baseline.

Baseline for Structured Data

  • Structured data refers to databases and spreadsheets.
  • Humans are not as good at analyzing structured data.
  • Human-level performance is less useful for structured data applications.
  • Look for state-of-the-art literature or open source results as a baseline.
  • Consider the performance of previous machine learning systems for comparison.

Importance of Baseline

  • Baseline performance indicates what is possible and helps prioritize areas for improvement.
  • It provides a rough estimate of irreducible error or Bayes error.
  • Establishing a baseline first leads to long-term success.

As an MLE, avoid making performance guarantees to your PM without establishing a baseline.


References

  1. https://www.coursera.org/learn/introduction-to-machine-learning-in-production/home/week/2
  2. https://community.deeplearning.ai/t/mlep-course-1-lecture-notes/54446 (need login)