If the Shoe Fits

My daughter recently turned three. This means my life is full of Disney Princesses. We have the dresses, we watch the movies, we listen to the songs in the car. I'm just praying she never finds out that Disney World is a real place. While the Princess hierachy is always in flux, Cinderella is usually in the top three.

Most of us are familiar with the Disney Cinderella story, but I'll rehash. A kind-hearted young woman is mistreated by her wicked step-mother and two horrible step-sisters. With the help of her Fairy Godmother, Cinderella dons a beautiful dress and rides in a pumpkin-turned-carriage to Prince Charming's ball where they fall in love. Before the prince can ask her name the clock strikes midnight, and Cinderalla runs away before her magical ball attire disappears. Providentially, Cinderalla loses one of her glass slippers. Prince Charming declares he will marry whomever the shoe fits. So the Grand Duke sets off across the kingdom, slipper in hand, to find Cinderella. The wicked step-mother breaks the slipper before Cinderalla can try it on and prove that she is the Prince's true love. Cinderella is seemingly prepared for this situation by presenting the other glass slipper, and they live happily ever after.

COME ON!!!

It's a fairytale, so we can excuse most of the flaws.

The inexscusable flaw is Prince Charming's plan. He's going to marry anyone that the shoe fits. I mean sure, it would be great as some form of early filtering criteria, but it can't be the deciding factor. The chances of finding Cinderella with this method are close to zero. How close to zero? Let's find out.

The Approach

First of all, we need to talk about the word "fits". Recently, I tried teaching the transitive property to my three-year-old with very little success. While playing hide-and-seek I explained that if Daddy could fit into a hiding spot, then by definition so could Mommy and so could my daughter. However, just because my daughter could fit into a spot, doesn't mean that Mommy, or therefore Daddy, could. Yea, it didn't really click for her...

Similarly, I recently learned that I've been wearing the wrong size shoes for over a decade. Perhaps I was trying to improve my perceived Blended Exclusivity Score, but I always bought 10.5 shoes when in reality I wear a 9 (I warned you I was dumb). The important thing is shoes that are 1.5 sizes too big still kinda fit. If I was buying shoes 1.5 sizes too small, I doubt I would've worn them for a decade.

The takeway from both of these examples is that "fit" is really a spectrum. Let's start thinking about the range of foot sizes that could be determined to fit in the glass slipper. Cinderella's foot is a perfect fit for the slippers. However, if someone's foot was just slightly bigger, they could likely still fit into the slipper (at least long enough to marry the Prince and escape to a life of luxury). Much bigger and you couldn't cram your monstorous stompers in them, even temporarily. Conversely, every foot smaller than Cinderella's would technically fit into the slipper, the same way I can fit into Shaq's shoes. At that point the Grand Duke must make a value judgement.

Side Note: Shoe sizes are one of those rare situations where US measurements are unintuitive, especially when you consider the vanity of starting woman's sizing at arbitrarily smaller numbers. I'm just going to talk about feet and shoes in terms of inches/mm for the rest of this post


Let's assume for a minute that Cinderella's foot is exactly 9 inches, and therefore the shoe perfectly fits anyone with a 9 inch foot.

Maiden Foot Length Marriage Material?
7.0" No Chance
7.5" Probably Not
8.0" Maybe?
8.5" Maybe?
9.0" Cinderalla
9.5" Maybe?
10.0" No Chance


Criteria for the Crown

There is a better way to structure how we think about this problem called Upper and Lower Control Limits (UCL & LCL). Let's imagine we are making ball bearings, and the specs call for them to have a diameter of 1". We have a problem. Nothing in reality is exactly 1". With precise enough tools we'll always find some decimal point where it differs. Also, each one is going to slightly differ from the others. The diameter of the ball bearing is Continuous Data.

Presumably, there was some logic and reason behind the specs requiring it to be 1" even though it's essential unobtainable in reality. Good specs would have included some wiggle room, some margin of error, some Control Limits. Is 0.9" ok? What about 0.9999999" or 1.000001"? Based on the needs, the specs might call for the ball bearings to be 1" with a UCL & LCL of 1.05" and 0.97".

I don't want to talk about Six Sigma too much in this post, but it's the elephant in the room. I'll quickly summarize it as calibrating your process so that your mean result, plus or minus six standard deviations (Six Sigma, get it?) still fall within your UCL and LCL. If your mean is right on target, you can be a little looser with your variation than if your mean is a little off.

Now, I just adore the fact that I can search for statistics on foot sizes and quickly find a journal article with 1.2 million 3D foot scans. What a time to be alive! Sadly, the raw data is "owned by third-parties" so I can't get that. Also, it's important to note some particulars in how they collected the data. First, they threw out all outliers. For women, this means that feet longer than 280mm and shorter than 210mm are discarded. Secondly, they converted feet measurement (which would normally be Continuous Data) into Discrete Data by categorizing them in 5mm increments. Instead of measuring a foot as say 223mm, they categorized it as a 220-225mm foot.

Fortunately, with enough discrete data and very liberal data quality standards, I can turn this back into continuous data and estimate a standard deviation. We're back to bell curves baby!

North American & European Women Foot Length Distribution

$$\mu = 245mm = 9.64" $$ $$\sigma = 14mm = 0.55" $$

Is this data exactly right? Nope, but it's good enough for picking apart a fairytale. I also need to make some other assumptions such as the size of Cinderella's slippers and what an acceptable UCL & LCL should be.

Certain nuances of older cartoons are hard to explain to a toddler. The wicked step-sisters aren't bad people because they have monstrous hooves for feet. Not to get all twitter physiognomy on you, but Disney likely intended the sasquatch nature of their feet as an outward manifestation of their inner nastiness. I digress a bit, but the point is the slippers clearly did not fit them, so Cinderella's foot must be substantially smaller. Extrapolating out that attractiveness equals goodness in Disney, I'll put Cinderella at 1 standard deviation below the mean at 9.1" or 231mm.

For the UCL & LCL we can be even more subjective. Since these shoes are made of glass I'll assume there is minimal flex and can set our tolerances pretty tight. I'll use 8.75"/222mm and 9.3"/236mm for our LCL & UCL.

Diving In

Now that we have some "data", we can finally start looking at answering our original question. How likely is the Grand Duke going to find the right maiden using this ridiculous plan? For context we need to figure out what percentage of the population fits our criteria. This part is relatively simple, so I'll expound on distribution curves a bit.

In reality, we can use either the Cumulative Distribution Function (CDF) or the Probability Distribution Function (PDF) for our calculation. The PDF is just the derivative of the CDF. Ok cool, but what does that mean and when would I use one vs the other?


Cumulative Distribution Function (CDF)

We'll start here. CDF is really useful for Discrete Values. The classic example is a dice roll and a simple question; What is the probability of getting a value lower than X? Truthfully, we're talking about ranges. If I say "lower than 3" I mean P(2 <= x <3). If you managed to roll a 2.5 that would classify as greater than or equal to 2 and less than 3, but I would have questions about your die. In reality, there is only one possible outcome that fits this category, 2. What are the odds of rolling a 2? 1:6

The important (and perhaps obvious) thing is that our probabilities accumulate, they're cumulative. What the CDF is trying to tell you is the probability that something is in that category or one of the previous ones. By the time we reach the value of 6 or greater our probability is 100%. That should make sense because we've captured the full range of possibilities. What are the odds that a single dice roll is less than 7? 100%. What about less than 8? 100%. I could do this all day.

Range Formula Probability
"Accumulation"
CDF
$$ -\infty $$ 1 $$ P( -\infty \le x < 1) $$ $$ +\frac{0}{6} $$ 0.0
1 2 $$ P( 1 \le x < 2) $$ $$ +\frac{1}{6} $$ 0.1666
2 3 $$ P( 2 \le x < 3) $$ $$ +\frac{1}{6} $$ 0.3333
3 4 $$ P( 3 \le x < 4) $$ $$ +\frac{1}{6} $$ 0.5000
4 5 $$ P( 4 \le x < 5) $$ $$ +\frac{1}{6} $$ 0.6667
5 6 $$ P( 5 \le x < 6) $$ $$ +\frac{1}{6} $$ 0.8333
6 $$\infty$$ $$ P( 6 \le x < \infty) $$ $$ +\frac{1}{6} $$ 1.0000

In the example of our original foot data, remember that our foot length wasn't recorded as actual length of the foot (i.e. 223mm) but that it fit into the 220-225mm category. I won't repeat that table with 14 categories, but I'll create fake data using our distribution and show that work in the attached excel file.

Of course, with more and more data points, and categories that become so small you might consider the data to be Continuous, you can extend this out and arrive at a perfectly smooth line between 0 and 1. That curved line would be the CDF of the Normal/Gaussian Distribution. Let's put our CDF next to our PDF and talk about their relationship.

Cumulative Distribution Function
$$ F(x; \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \int_{-\infty}^{x} e^{\frac{-(x-\mu)^2}{2\sigma^2}} dx$$
Probability Distribution Function
$$ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^\frac{-(x-\mu)^2}{2\sigma^2}$$

The PDF is the derivative of the CDF, and the CDF is the integral of the PDF. Notice that the only difference in our formulas is the integral symbol after the constant. The CDF is telling you the area under the curve of the PDF from negative infinity to X. Go ahead and hover your mouse over 245 on the x-axis. The PDF is symmetrical around 245, so exactly half of the area under the curve is left of that point and half is to right. The CDF value is 0.5 to reflect this relationship.

So what is the probability that a random maiden's foot length is between 222m and 236mm. The way we do that is find the integral of the PDF from 222 to 236 in order to find the area underneath the PDF curve between those two x-axis values. These two steps are the exact same, simply because our CDF is already the integral of our PDF.

Integrate PDF from 222 to 236

$$ PDF = f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^\frac{-(x-\mu)^2}{2\sigma^2} $$ $$ \int_{}^{} f(x) dx = F(x) = CDF $$ $$ CDF = F(x; \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \int_{-\infty}^{x} e^{\frac{-(x-\mu)^2}{2\sigma^2}} dx$$ $$ P(Maiden_{222-236mm}) = \frac{1}{14\sqrt{2\pi}} \int_{222}^{236} e^{\frac{-(x-245)^2}{2*14^2}} dx$$ $$ P(Maiden_{222-236mm}) = 21\% $$
Subract Y-Values from CDF at 222 and 236

$$ P(Maiden_{222-236mm}) = CDF_{236} - CDF_{222} $$ $$ P(Maiden_{222-236mm}) = 0.26 - 0.05 $$ $$ P(Maiden_{222-236mm}) = 21\% $$

So 21% of all the maidens in the kingdom would be deemed to "fit" Cinderella's shoe. Not looking that great. If we knew the number of young maidens in the kingdom was say, 1,000, Then 210 of them would fit the shoe and be whisked off to get married. Assuming the Grand Duke visits maidens at random until he fits a "fit", there is a 0.47% chance he will find the right Cinderella.


The Secretary Problem

But we all know that Grand Duke is a shrewd little fellow. How else did he rise to such power? The King is too emotional, and the Prince is too flippant to run the bearacracy required for such a kingdom. This man behind the throne is actaully calling the shots, and his weapon is cold analytical prowess.

The Grand Duke heard the Prince's plan. He knows he needs to both follow his wishes and improve his opportunity for success. So he refers back to the Secretary/Marriage/Fussy Suitor Problem, aka Optimal Stopping Theory. What a great guy! I bet I'd like this Grand Duke.

The premise of the Secretary Problem is this: imagine you want to hire a secretary, or find a Cinderalla, and you know there are n applicants. Once you conduct an interview, or try on a shoe, you have to either accept or reject. There is no going back to a previous applicant or maiden. Surely you won't know if the first candidate or maiden is better than average, you don't have any baseline to compare them to. You can't wait till the very last one, because by then you've already rejected everyone and you're stuck with that person. Where exactly is this middle ground?

To apply the Stopping Rule, you should:

  1. Always reject the first n/e applicants
  2. Of those rejects, remember which one of these was the best.
  3. Consider the remaining applicants
  4. Select the first applicant superior to the best reject

Suprisingly, this will take the Grand Duke's odds of finding the right Cinderella from 0.47% to ~37%. I'll explain using the previous assumption that 210 maidens fit the shoe. So our n = 210. We reject the first 210/e or 77 maidens that fit the shoe. There is a 36.7% chance that Cinderalla is in that group. Which means there is a 63.3% chance she is in the second group. But how do we know the odds of Cinderella being the next best maiden in that group, given that she is in that group.

Remember when we're doing statistics there are a few logical operators. The vertical bar or pipe should be read as "given that". Quick example with cards. Let's say I draw a card at random. What are the odds it is a King? There are 4 kings and 52 total cards.

$$ P(King) = \frac{4}{52} = 7.7\% $$

Now, what are the odds that the card is a King, "given that" the card I draw is also a face card? There are 12 face cards. Essentially what we are asking is what percentage of the time do we draw a King among the times that we draw a Face Card. This case is simple to intuit, since King is a subset of Face Cards, and know the answer is 4/12 or 33.33%, but let's expound because it will be useful next. The Conditional Property states the following:

$$ P(A | B) = \frac{P(A \bigcap B)}{P(B)} $$ $$ P(King | FaceCard) = \frac{ P(FaceCard \bigcap King)}{P(FaceCard)} = \frac{\frac{4}{52}}{\frac{12}{52}} = 33.33\% $$

Which of course we can use a litte algebra and rewrite as:

$$ P(FaceCard \bigcap King) = P(King | FaceCard) * P(FaceCard) $$

In our case, Cinderella is the ideal candidate so we want to know what are the odds any individual is chosen, and what are the odds that particular individual is Cinderella. Add up all of those individual odds, and you'll end up with the odds that Cinderella, or the ideal candidate, is chosen. We can write the formula as summation of odds for each individual meeting both criterias.

$$ P(Cinderella) = \sum_{i=1}^{n} P(i\ is\ selected\ \bigcap\ i\ is\ Cinderella) $$ $$ P(Cinderella) = \sum_{i=1}^{n} P(i\ is\ selected\ |\ i\ is\ Cinderella) * P(i\ is\ Cinderalla) $$

Remember we rejected the first n/e applicants. So we have to combine the odds that Cinderella is in the second group with the odds that each of the individuals in that second group is Cinderella. Which is why our summation will begin at r = n/e, the number of candidates that are not rejected.

$$ P(Cinderella) = \frac{r-1}{n} \sum_{i=r}^{n} \frac{1}{i-1} $$ $$ where\ r\ is\ the\ number\ of\ applicants\ to\ reject $$

$$ P(Cinderella) = \frac{77-1}{210}\sum_{i=r}^{n} \frac{1}{i-1} = 0.362\sum_{i=r}^{n} \frac{1}{i-1} = 0.362 * 1.0206$$ $$ P(Cinderella) = 0.3695 = 36.95\% $$

With this one weird trick, the Grand Duke has greatly increased his odds of finding Cinderella and will maintain his grasp on power in the kingdom.


How to Ruin Trick or Treating

So tonight is Halloween, and yes, my daughter is going as Cinderella. Even if you don't have kids, or if they dress up as something else, I hope you take this opportunity to bore all of your friends by explaining the Prince's folly.

Or you could decide to teach your kid the Secretary Problem by only allowing them a single piece of candy. Have them turn down candy at the first 37% of the houses you visit in order to determine the best option. That will really get them into the holiday mood.

Top