Get Started In Modelling


Tips for Getting Started

When starting a machine learning project, there are several tips to consider for efficient progress.

Getting Started on Modeling

  • Conduct a literature search to understand existing approaches and possibilities.
  • Focus on practical implementations instead of the latest algorithms.
  • Open source implementations can provide a baseline and facilitate quick start.
  • A reasonable algorithm with good data often outperforms a cutting-edge algorithm with poor data.

Deployment Constraints

Consider deployment constraints, such as compute limitations, when confident about the project’s success.

  • If the project is still at an early stage, it’s acceptable to prioritize establishing a baseline over deployment constraints.

Sanity Checks

  • Before running the learning algorithm on all data, perform quick sanity checks.
  • Overfit a small training dataset to ensure the algorithm is functioning correctly.
  • Test the algorithm with one or a few training examples to catch bugs early.
  • Training on a small subset of images can reveal issues before scaling up.

Error Analysis and Performance Auditing

  • After training the initial model, conduct error analysis to identify areas for improvement.
  • Analyze errors to understand patterns and make targeted adjustments.
  • Carry out performance auditing to assess the model’s overall performance before deployment.

Handling Large Model and Data

  1. Distributed Training
  2. Pipeline Prallelism
  3. Knowledge Distillation

References

  1. https://www.coursera.org/learn/introduction-to-machine-learning-in-production/home/week/2
  2. https://community.deeplearning.ai/t/mlep-course-1-lecture-notes/54446 (need login)