hypergeometric

I did laundry the other day, and as I grabbed a chunk of my newly-laundried socks to transition from the living room to my bedroom, I wondered: how many pairs of socks am I holding (on average)?

I couldn’t immediately think of the right probability distribution, and I really just wanted an answer, so I wrote a small Haskell program, below. (I plan on doing many of these sub-fifty-line programs to address tiny computational curiosities in the future, with emphesis on readability more than optimization.)

import Data.List (delete)
import Prelude hiding (Left,Right)
import System.Environment (getArgs)
import System.Random (random, getStdGen, RandomGen)

type Sock    = (Species,Parity)
type N_Pairs = Int
data Parity  = Left | Right deriving (Eq,Show)
type Species = Int

main = do
    args <- getArgs
    g    <- getStdGen

    {- Note that n is the number of initial *pairs*,
     - while k is the number of picked *socks*. Therefore
     - the case n == k corresponds to picking half the
     - jumbled socks at the end. -}
    let (n,k) = ( read (args!!0) :: Int
                , read (args!!1) :: Int )

    putStrLn . show . pairs . take k . jumble g $ socks n

socks :: N_Pairs -> [Sock]
socks n = [ (j,foot) | j    <- [1..n]
                     , foot <- [Left,Right] ]

pairs :: [Sock] -> N_Pairs
pairs []          = 0
pairs (sock:pile) = if other sock `elem` pile
                  then succ . pairs . delete (other sock) $ pile
                  else pairs pile

jumble :: RandomGen g => g -> [Sock] -> [Sock]
jumble _ []   = []
jumble g pile = pick : jumble g' (delete pick pile)
    where (k,g') = random g
          pick   = pile !! (k `mod` length pile)

other :: Sock -> Sock
other (j,foot) = case foot
                   of Left  -> (j,Right)
                      Right -> (j,Left)

A sequence of this program’s output is represented in the image. After running this, I decided to do some “real” math and look up the appropriate distribution.

I don’t feel too bad for not knowing it; this problem (almost) obeys a multivariate hypergeometric distribution. You can read the linked Wikipedia article for more general details, but the result is that, if we pick n socks out of N, the likelikhood of picking k≡(k1,k2, …, kN/2) socks is (∏j  2Ckj) / NCn, where nCk means n choose k. It might be more helpful to read that in pseudo-Haskell:

-- the argument ks is the vector k above,
-- where |k| is the total picked
pmf :: [Integer] -> Double
pmf ks = ( product [ 2 `choose` k | k <- ks ] )
       / ( n `choose` length ks )

In each expression of the probability mass function, the 2 comes from the fact that there are two socks in a pair.

For the laundry problem, though, I wasn’t particularly the interested in which pairs I got, but rather how many pairs. If we’re interested in any of the n picked socks being paired, then we’d actually like to ask about the subspace of all vectors k with L1(Z) norm n (socks picked) which have p “twos” (pairs).

I’m pretty sure, but please correct me if I’m wrong, that for an S dimensional 2×2×…×2 lattice, there are ∑j(S)j, where j runs from ⌈k/2⌉ to min(k,S), points with Manhattan norm k, where (x)j is to be read as a Pochhammer symbol (falling factorial). Correspondingly, for norm k, there are (S)k/2⌉ distinct instances of ⌊k/2⌋ pairs for k picked; there are (S)k/2⌉+1 for (k-1) picked, and so forth (with some weird effects when you get to the case k > S).

I’m not going to tackle this further analytically, but I will try plotting the modified multivariate hypergeometric distribution when I get back to my Mathematica and can have it do the heavy lifting.

Edit: I didn’t really mean 2×2× .. ×2, I meant, a lattice embedded in Rn with points where there are non-negative integral coordinates less than 3.

TL;DR: I really can’t be blamed for not knowing why I have so few matching socks.