Data Scientist Framework : OSEMN

Explains how does data scientist work through a problem & their thought process

Posted by Dwi Hadyan Harsono on Sept 15, 2021
TLDR; Know what problem we want to solve first. (O)btain necessary data, (S)crub data to analysable format, (E)xplore data to understand its behavior, build (M)odel using the data, i(N)terpret result so that it can answer the problem we have define in the first place

Let's speedrun this!

OSEMN is a 5 iterative stage on how does data scientist work through a problem. Where each stands of (O)btain, (S)crub, (E)xplore, (M)odel, i(N)terpret

1) Obtain

Literally just obtain data that is relevant to the problems that you want to solve. You can get these datas from Kaggle, Google Dataset, or from your company. Best is in csv and/or excel format

2) Scrub

Clean data from any typo & convert all to format that is most suitable to the model, which is numerical

3) Explore

Understand how does the data correlates with the other, so that you can decide which features to be enterred into the model

4) Model

Put in the features you've decided into the model , and tune it to find the best parameters. Careful not to overfit!

5) iNterpret

Voila, you got your result from the model, time to interpret the result in non-tech term so that you can present it to your boss

Implementing OSEMN

Since I love you all and you made it this far, I've created a Notebook that shows how can we implement this OSEMN framework in real life. You can see it Here

Problem : With given person height, how much is the person weight most likely ?

Solution : Build a simple Machine Learning Linear Regression Model that the predict the person weight with given person height

...

Coding Time!

Obtain : Use a public data on random male & female weights & heights, in csv (csv is good for u kids)

Scrub : Make sure theres no empty value & all weights and heights are in numerical value

Explore : We see what is the correlation between weight & height

Model : Use Linear Regression to train the model using the height data in order for the model to learn the weight

iNterpret : We can get the linear regression general solution (intercept & coefficient) for predicting the person weight, given height

Thanks for reading, and don't message me if you're lost (lmao jk, you can add me in linkedin and ask me there!