# hypergeometric

I did laundry the other day, and as I grabbed a chunk of my newly-laundried socks to transition from the living room to my bedroom, I wondered: how many pairs of socks am I holding (on average)?

I couldn’t immediately think of the right probability distribution, and I really just wanted an answer, so I wrote a small Haskell program, below. (I plan on doing many of these sub-fifty-line programs to address tiny computational curiosities in the future, with emphesis on readability more than optimization.)

``````import Data.List (delete)
import Prelude hiding (Left,Right)
import System.Environment (getArgs)
import System.Random (random, getStdGen, RandomGen)

type Sock    = (Species,Parity)
type N_Pairs = Int
data Parity  = Left | Right deriving (Eq,Show)
type Species = Int

main = do
args <- getArgs
g    <- getStdGen

{- Note that n is the number of initial *pairs*,
- while k is the number of picked *socks*. Therefore
- the case n == k corresponds to picking half the
- jumbled socks at the end. -}
let (n,k) = ( read (args!!0) :: Int
, read (args!!1) :: Int )

putStrLn . show . pairs . take k . jumble g \$ socks n

socks :: N_Pairs -> [Sock]
socks n = [ (j,foot) | j    <- [1..n]
, foot <- [Left,Right] ]

pairs :: [Sock] -> N_Pairs
pairs []          = 0
pairs (sock:pile) = if other sock `elem` pile
then succ . pairs . delete (other sock) \$ pile
else pairs pile

jumble :: RandomGen g => g -> [Sock] -> [Sock]
jumble _ []   = []
jumble g pile = pick : jumble g' (delete pick pile)
where (k,g') = random g
pick   = pile !! (k `mod` length pile)

other :: Sock -> Sock
other (j,foot) = case foot
of Left  -> (j,Right)
Right -> (j,Left)
``````

A sequence of this program’s output is represented in the image. After running this, I decided to do some “real” math and look up the appropriate distribution.

I don’t feel too bad for not knowing it; this problem (almost) obeys a multivariate hypergeometric distribution. You can read the linked Wikipedia article for more general details, but the result is that, if we pick n socks out of N, the likelikhood of picking k≡(k1,k2, …, kN/2) socks is (∏j  2Ckj) / NCn, where nCk means n choose k. It might be more helpful to read that in pseudo-Haskell:

``````-- the argument ks is the vector k above,
-- where |k| is the total picked
pmf :: [Integer] -> Double
pmf ks = ( product [ 2 `choose` k | k <- ks ] )
/ ( n `choose` length ks )
``````

In each expression of the probability mass function, the 2 comes from the fact that there are two socks in a pair.

For the laundry problem, though, I wasn’t particularly the interested in which pairs I got, but rather how many pairs. If we’re interested in any of the n picked socks being paired, then we’d actually like to ask about the subspace of all vectors k with L1(Z) norm n (socks picked) which have p “twos” (pairs).

I’m pretty sure, but please correct me if I’m wrong, that for an S dimensional 2×2×…×2 lattice, there are ∑j(S)j, where j runs from ⌈k/2⌉ to min(k,S), points with Manhattan norm k, where (x)j is to be read as a Pochhammer symbol (falling factorial). Correspondingly, for norm k, there are (S)k/2⌉ distinct instances of ⌊k/2⌋ pairs for k picked; there are (S)k/2⌉+1 for (k-1) picked, and so forth (with some weird effects when you get to the case k > S).

I’m not going to tackle this further analytically, but I will try plotting the modified multivariate hypergeometric distribution when I get back to my Mathematica and can have it do the heavy lifting.

Edit: I didn’t really mean 2×2× .. ×2, I meant, a lattice embedded in Rn with points where there are non-negative integral coordinates less than 3.

TL;DR: I really can’t be blamed for not knowing why I have so few matching socks.