# The chicken egg dilemma

What came first: the labeled data or the machine learning model?

If you have labeled data, then you can train a machine learning model.

If you have a trained machine learning model, then you can label data.

# Precision vs. Recall

Suppose you are in case 2 (you have a model). Then how is the **precision** and **recall** of your model? Unless both are $100\%$, you might still want to label data.

```
recall
100% | < 100%
--------------------------------------
Your model is | You can confidently
p 100% perfect. You | trust the labels you
r don't need more | get. But your model
e data. | might miss some.
c --------------------------------------
i You get | You get some
s < 100% labels, but you | labels, but you
i still need to | still need to
o confirm that | confirm that
n they are correct| they are correct
```

Most people will be at the lower-right quadrant (precision and recall will be $\lt100\%$). The goal will be to get as close as possible to $100\%$. Therefore, the model will output labels which can be the basis for manual labeling.

# Your case

If you can label all rows with some rules and **you are creating** these rules, then, in fact, you are already manually labeling data. Further, you could use a **decision tree** later on, to model all your rules in order to deal with new data points that come your way.

What exactly would be the benefit of using a decision tree in that case? Is it just that I dont have to implement the logic because it's covered by the model? – Stev – 2018-09-02T14:30:47.137