What is the difference between Statistics and Probability?

Consider a black box with 1 input button and two output Leds. When pressing the button, one of the two leds light up for a certain time. If pressed again, the same one lights up again or the other one might light up again.

What is the probability of Led 1 lighting when the button is pressed? We can’t answer that if we don’t MEASURE what the output is with a given input.

Statistics is all about measuring and trying to MODEL the observed behavior. With a model I mean any kind of description that fully describes the behavior of the box.
Once you have enough measurements ( this can never be enough in reality ) of the observed physical phenomenon , you can try to emulate the behavior of the black box by a model. This model is a probability model.
So statistics is all about gathering data and constructing the underlying model of a physical process.
Probability is all about using that model to simulate the behavior of the physical process.

Example :
Suppose we have a black box with a midget inside with food and water for 100 years( plus of course ample room for toilet facilities ). The midget has an unbiased coin which it flips when it sees the button is pressed. Based on the outcome of the coin flip ( Heads or Tails ) it then pushes one of the two buttons to light Led 1 or Led 2. Heads is for Led 1 and Tails is for Led 2. It can never be both. The led lights up for 1 second and then turns off.

Now, from the outside we only see a black box with 1 button and 2 leds. We can now begin to press the button and mark the result every time we press it. After about 100 measurements, we can conclude that half of the time Led 1 is lit and the other half of the time Led 2 is lit. Because this scales so well ( results are the same for 10 measurements , for 100, for 1000, for 1000000 ) we conclude that the underlying physical process has a probability of 0.5 for lighting Led 1 and 0.5 for the other Led.

Now we can SIMULATE the behavior of the box. We build that same box with the same button and leds. Now we hook up a computer that does the virtual tossing of the coin.

If we ask anybody if there is a difference between the two boxes after 100 buttons pushes per box, there is a probability of almost ( but not exactly ) 1.0 the answer would be NO. That means you can’t distinguish between the two boxes. This means the underlying probability MODEL has a very high quality factor.

If we want to EMULATE the behavior of the box, we would have to OPEN it, observe the physical processes of the box and COPY those processes as best as we can. We will never reach exactly the same behavior( for this to be true, the emulated box and the original box would have exactly the same response on exactly the same time when a button is pressed ), but for all practical purposes the resulting quality level of the emulated box would be close to 1.0