ML / AI

ML Interview Prep: The 15 Questions That Decide the Role

Machine learning model concepts visualised on a screen

After sitting on both sides of interviews at 40-plus companies that hire ML interns, one pattern is clear: they aren't testing whether you can recite definitions. They're testing whether you've actually trained a model and understood what went wrong.

Here are the core questions that come up again and again, grouped by theme — and, more importantly, what a strong answer sounds like.

Fundamentals they always ask

  • Explain the bias-variance trade-off. High bias = underfitting (too simple, misses patterns); high variance = overfitting (memorises noise). A great answer names a time you saw it: "my decision tree hit 99% train and 70% test accuracy — classic high variance, so I pruned it and added cross-validation."
  • What is overfitting and how do you prevent it? More data, regularisation, simpler models, dropout, early stopping, cross-validation. Pick the ones you've used.
  • Why do we split data into train / validation / test? To estimate real-world performance and tune without leaking the test set.

Questions about how you work

  • Walk me through cross-validation. Explain k-fold plainly: split into k parts, train on k-1, validate on the rest, rotate, average. Mention why — a more reliable estimate than a single split.
  • How do you handle imbalanced data? Resampling (SMOTE, undersampling), class weights, and — crucially — the right metric. "I stopped trusting accuracy and switched to precision/recall and the F1 score."
  • Which evaluation metric would you choose and why? Tie it to the problem: precision when false positives are costly, recall when missing a positive is costly.
The strongest answers always sound like "here's what happened when I tried it," not "the textbook says." Stories beat definitions.

Feature engineering and modelling

  • What is feature engineering and why does it matter? Often the biggest lever on performance — encoding categoricals, scaling, creating ratios, handling dates.
  • How do you pick a model? Start simple (logistic regression / a tree), establish a baseline, then justify added complexity with measured gains.
  • Explain the difference between bagging and boosting. Bagging (Random Forest) reduces variance by averaging; boosting (XGBoost) reduces bias by learning from previous errors sequentially.

The question behind every question

Almost every interviewer is really asking: have you built something end to end? The candidates who get the offer are the ones who can say "in my plant-disease classifier I hit this exact problem, and here's how I diagnosed and fixed it." Definitions are table stakes; experience is the differentiator.

How to actually have those stories

You can't narrate experience you don't have. The reliable way to build it is to ship two or three real ML projects with someone reviewing your decisions — which is exactly the shape of a mentored Machine Learning internship: real datasets, a model you take from baseline to deployment, and a mentor pushing on your choices until the answers above are just things you've lived.

Build the projects that answer these questions

Our Machine Learning internship takes you from baseline model to deployment on real datasets — with mentor reviews and a verifiable certificate.

Explore the ML Program
Keep Reading