support vector machines


visualisation of how the kernel trick makes a non-separable collection of points linearly separable.

I guess the kernel mappings really add a dimension, rather than replacing a dimension, don’t they.

"Kernel", "Apace" and "Hyperplanes" in Support Vector Machines

Support Vector Machines

Space: this refers to the group of axes (plural of axis) you are using. So for example, if you have just X,Y axes for your plot, this is a 2-dimensional space. You can be in a 3-dimensional space if you have X,Y,Z axes.

Kernel: This is how you map your data into higher dimensional spaces. Why do we want to do this? Remember the straight and curvy lines I mentioned before. If our data can’t be separated by a straight line we might need to use a curvy line. Here’s the secret: a straight line in a higher dimensional space can be a curvy line when projected onto a lower dimensional space. So what we are really doing is using the kernel to put our data into a high dimensional space, then finding a hyperplane (“straight line”. not exactly, but I’ll explain it next) to separate the data in that high dimensional space. This straight line looks like a curvy line when we bring it down to the lower dimensional space that our data lives in. EXAMPLE TIME! Let’s suppose our labeled data (“X and O’s”) live in a two dimensional space (think X-axis and Y-axis plot). We need to separate the data with a curvy line, but since the SVM can only use straight lines, we need to use a kernel to bring the data into a higher dimensional space and separate it with a straight line, which looks like a curvy line in the low dimensional space.

Hyperplane: this is how we generalize the concept of a straight line in two dimensional space, because we don’t always use two dimensional spaces. A hyperplane just means something straight that splits the space into two parts. Imagine our X,Y space again. A straight line would split the space into two parts, so it is a hyperplane! Now imagine X,Y,Z space (3-dimensional). A flat piece of paper (a plane) would split the space into two parts, so it is a hyperplane! Now you can imagine even higher dimensional spaces, there is something that will split the space into two parts. That thing is a hyperplane!

To summarize: an SVM uses hyperplanes (straight things) to separate our two differently labeled points (X’s and O’s). Sometimes our points can’t be separated by straight things, so we need to map them to a higher dimensional space (using kernels!) where they can be split by straight things (hyperplanes!). This looks like a curvy line on our original space, even though it is really a straight thing in a much higher dimensional space!