Almost like algorithms trained on biased data will exhibit bias, no?
Basically, the algorithm doesn’t suck. You suck.
I’ve changed my mind on this. You suck, but the algorithm sucks, too.
Think about is this way: take image classification, particularly facial recognition—all pixels have to be converted to numbers for the computer to use. Seems reasonable to convert color values to something like RGB, right? But, the RGB for a white pixel is (255,255,255) while RGB for a black pixel is (0,0,0). Lighter pixels get higher values than darker pixels. What’s a picture of a black face made of, mostly? Darker pixels and hence lesser numerical value (take this instance of “lesser” to be purely in the numerical sense here, and not as a value judgment, but it does matter to the value judgment later on).
If your algorithm uses something like average pooling (basically, take the average color value of a small patch of the image, like a 3x3 square), those darker areas contribute less overall numerical value to the data that gets propagated through the algorithm.
Therefore, an algorithm that makes some typical default assumptions like “convert pixels to RGB” and “use average pooling” will take a darker-skinned face as input and see a bunch of vectors of lesser overall magnitude that make it difficult for the algorithm to pick up key differences that allow it to place this instance into one region of a partitioned dataset (aka, it makes it harder to categorize because it may not think there’s much there—it’s seeing a lot of things that appear to be close to 0, and 0 is basically the absence of information in many cases). So it may not be able to differentiate a black man from a black woman very easily. Or worse, it may not even see a person there at all, and in that case even if “not a person” isn't one of it’s possible category labels, the output is going to be kind of random, because the algorithm as structured can’t make sense of an image that it’s presented with.
These assumptions and algorithmic choices don’t exist in a vacuum. “Numbers” themselves aren’t biased, of course, but the way you use them and put them together can cause systems to behave in a biased fashion.









