statistical models


Adversarial Machines

There’s been some interesting developments recently in adversarial training, but I thought it would probably be a good idea to first talk about what adversarial images are in the first place. This Medium article by @samim is an accessible explanation of what’s going on. It references this talk by Ian Goodfellow, asking if statistical models understand the world.

Machine learning can do amazing magical things, but the computer isn’t looking at things the same way that we do. One way to exploit that is by adding patterns that we can’t detect but that create enough of a difference in the data to completely fool the computer. Is it a dog or an ostrich?

There’s been quite a lot of research into finding ways round this problem as well as exploiting it to avoid facial recognition or other surveillance. And, like I said, there’s been some interesting recent developments that I hope to talk about here.


Statistics involves the study of variability. When a researcher works with something that involves a lot of uncertainty, the idea of random behavior is considered, where random behavior is unpredictable in the short-run, but it has a regular and predictable distribution in the long-run.

The foundation of the concept of probability is that the outcome cannot be predicted ahead of time, but it can be described by a regular pattern that emerges after many repeated trials.

When the recorded proportions of a sequence of trials for an event approach a fixed value, that value is said to be the probability of the event.

The following graph demonstrates the proportions of getting a heads in repeated coin tosses:

Notice how the proportions begin to level between the value of 0.5. Therefore, the probability of getting a heads in a fair coin toss is 0.5.

In statistics, a phenomenon is called random if the individual outcomes are uncertain, yet these outcomes have a regular and predictable distribution with a large number of repetitions.

Then the probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in an infinitely long series of trials.

The difference between proportions and probabilities is that a proportion is a known or observed value, while a probability is a theoretical value of a proportion after an infinite series of trials, where proportions are dealt in the present tense and probabilities are dealt in the future tense.

Probability Model
The mathematical model used to describe random behavior is called a probability model.

The following are two components to a probability model:
1. A list of possible outcomes called a sample space.
2. A probability for each outcome in the sample space.

Sample Space
The sample space Ꮥ of a random phenomenon is the set of all possible outcomes.

If a coin was tossed three times, the sample space would be the following:
The following is the sample space of the number of heads that could be observed after three coin tosses:
Ꮥ = {0,1,2,3}

The following is the sample space of all possible outcomes when two six-sided dice are rolled:
Ꮥ = {11,12,13,14,15,16,21,22,23,24,25,26,31,32,33,34,35,36,41,42,43,44,45,46,51,52,53,54,55,56,61,62,63,64,65,66}
The first digit in each outcome represents the number on the top face of the first die and the second digit represents the number on the top face of the second die. Notice there are 6 x 6 = 36 possible outcomes.
If a researcher was interested in the possible sums from the two dice, the following is its sample space:
Ꮥ = {2,3,4,5,6,7,8,9,10,11,12}

Probabilities of Outcomes
Suppose the sample space for an experiment is Ꮥ = {O₁,O₂,…,On}. Then the probability of outcome Oᵢ is denoted as pᵢ.

These probabilities of the outcomes must satisfy the following two conditions:

1. 0 ≤ pᵢ ≤ 1 for all i = 1, 2, …, n
2. p₁ + p₂ + … + pn = 1

An event is any subset of outcomes from the sample space.

When rolling two dice, the event of “at least one 4” is the following subset:
A = {14,24,34,44,54,64,41,42,43,45,46}
The event of “sum is 9” is the following subset:
B = {36,45,54,63}

The probability of any event is equal to the sum of probabilities of the outcomes contained in that event.

If all of the outcomes in the sample space are equally likely, then the probability of an event A occurring is equal to the following:

From the previous example, the probability of the event A = “at least one 4” occurring is the following:
P(A) = |A|/|Ꮥ| = 11/36 ≈ 0.3056
The probability of the event B = “sum is 9” occurring is the following:
P(B) = |B|/|Ꮥ| = 4/36 ≈ 0.1111

Intersection and Union
The intersection of two events A,B, denoted as A ⋂ B, consists of all outcomes that are contained in both A and B.

The probability of A ⋂ B occurring, denoted as P(A ⋂ B), is the probability that both events occur at the same time.

From the previous example, the only two outcomes that contain at least one 4 and the sum is 9 is A ⋂ B = {45,54}.
Therefore, the probability of event A and event B occurring at the same time is the following:
P(A ⋂ B) = |A ⋂ B|/|Ꮥ| = 2/36 ≈ 0.0556

The union of two events A,B, denoted as A ⋃ B, consists of all outcomes that are contained in at least one of the events A or B.

The probability of A ⋃ B, denoted as P(A ⋃ B), is the probability that either of the two events occurs, including when they both occur.

From the previous example, the following event contains at least one 4 or has the sum of 9:
A ⋃ B = {14,24,34,44,54,64,41,42,43,45,46,36,63}
Therefore, the probability of getting at least one 4 or getting a sum of 9 is the following:
P(A ⋃ B) = |A ⋃ B|/|Ꮥ| = 13/36 ≈ 0.3611

The probability of the union of two events can use the following formula:

P(A ⋃ B) = P(A) + P(B) – P(A ⋂ B)

The reason P(A ⋂ B) is subtracted is because when P(A) and P(B) are added, then P(A ⋂ B) has been counted twice, and so P(A ⋂ B) is subtracted once to avoid over-counting.

From the previous example, the union probability formula could have been used instead:
P(A ⋃ B) = P(A) + P(B) – P(A ⋂ B)
= 11/36 + 4/36 – 2/36 = 13/36

Mutually Exclusive Events
Two events A,B are mutually exclusive, or disjoint, if the two events do not have any outcomes in common, where A ⋂ B = ∅ and so P(A ⋂ B) = |∅|/|Ꮥ| = 0.

Consider the following events A,B:
A = “first die shows a 1” = {11,12,13,14,15,16}
B = “sum is at least 8” = {26,35,36,44,45,46,53,54,55,56,62,63,64,65,66}
Since P(A ⋂ B) = 0, A and B are mutually exclusive.

Therefore, when A and B are mutually exclusive, the probability of A ⋃ B becomes the following:

P(A ⋃ B) = P(A) + P(B)

From the previous example, since P(A ⋂ B) = 0, P(A ⋃ B) = P(A) + P(B) = 6/36 + 15/36 = 21/36 ≈ 0.5833.

Exhaustive Events
Two events A,B are exhaustive of the sample space if they together contain all outcomes of the sample space.

Consider the events A = “first die is 5 or 6” = {51,52,53,54,55,56,61,62,63,64,65,66} and B = “sum is less than 11” = {11,12,13,14,15,16,21,22,23,24,25,26,31,32,33,34,35,36,41,42,43,44,45,46,51,52,53,54,55,61,62,63,64}
Then A ⋂ B = Ꮥ. Therefore, A and B are exhaustive of the sample space.

The complement of an event A, denoted as Aᶜ, consists of all outcomes in the sample space that are not outcomes in A.

P(Aᶜ) = 1 – P(A)

Two events are complements of each other if they are both mutually exclusive and exhaustive of the sample space.

Consider the events A = “exactly one die shows an odd number” = {12,14,16,21,23,25,32,34,36,41,43,45,52,54,56,61,63,65} and B = “sum is even” = {11,13,15,22,24,26,32,33,35,42,44,46,51,53,55,62,64,66}.
The events A and B are complements of each other, because they together contain all possible outcomes, but they do not contain any outcomes in common.
Therefore, A = Bᶜ and B = Aᶜ.

koeskull  asked:

I saw a link to your post talking about how the common ancestor of dinosaurs was not scaly, but do you have any links to scientific papers that discuss that? Even just titles. I'm doing a paper on feather evolution and I'm pretty new to scientific literature about dinosaurs.

Well, some of the most important papers about feather evolution these days would probably be the discoveries of Kulindadromeus and Tianyulong, dinosaurs very distant to birds with feathers or feather-like covering.

The post you saw was probably my critique of this paper, which used statistical modeling techniques to estimate the probability of the ancestor of dinosaurs having feathers. Other notable papers on the subject of feather evolution include this one and this one. Both address the topic of how feathers initially developed.

Once you have a few papers on a subject, it can be useful to look at the papers’ reference lists to see older papers on the same subject, and to look up the paper on Google Scholar and click to see which other, more recent papers the paper you have was cited by.

AI Death and You: A guide to why...

I think Tucker is gonna be ok……

  • Tucker lacks the necessary neural interface implants to actually integrate with an AI in the way the Agents of Project Freelancer did. So Epsilon did not “kill himself in Tucker’s brain” as Epsilon did with Agent Washington.
  • Epsilon-Church voluntarily deconstructed his memory based personality; not in a desperate and chaotic act of self demolition but in a calculated and organized act of self renewal.
  • Epsilon was not stupid or cruel and I don’t believe he would have fragmented if he had run through even one possible scenario in his statistical models where Tucker ended up damaged because of it. He would have found another way.
  • Tucker is significantly more resilient than anyone gives him credit for. He will be alright and he will not suffer the same sort of damage that Wash did.

I’m not saying it’ll be easy for him to accept that Church is gone (again) but the emotional fallout will not be a physical result of Epsilon’s fragmentation.

“I met my wife on a flight from Chicago to Houston. I was sitting in 3A and she was sitting in 3B. I fly first class all the time because I get free upgrades, but she had one of those bargain basement tickets, so we still have no idea how she got seated in a first class seat. I remember saying ‘howdy’ when she sat down, but for the first part of the flight, I just put in my earplugs and worked on my laptop. I’d already been divorced, so I had no inclination of meeting someone. But when the food was served, we started talking, and I learned that she had a PhD in children’s learning studies. I’m an engineer, so we started talking about statistical models. At the end of the flight, I gave her my email address, and told her: ‘If you’re interested in getting dinner, send me an email.’ When I told my friend about it, he said: ‘You gave her an email address? You’ll never hear from her again!’”

New role for immature brain neurons in the dentate gyrus identified

University of Alabama at Birmingham researchers have proposed a model that resolves a seeming paradox in one of the most intriguing areas of the brain — the dentate gyrus.

This region helps form memories such as where you parked your car, and it also is one of only two areas of the brain that continuously produces new nerve cells throughout life.

“So the big question,” said Linda Overstreet-Wadiche, Ph.D., associate professor in the UAB Department of Neurobiology, “is why does this happen in this brain region? Entirely new neurons are being made. What is their role?”

In a paper published in Nature Communications on April 20, Overstreet-Wadiche and colleagues at UAB; the University of Perugia, Italy; Sandia National Laboratories, Albuquerque, New Mexico; and Duke University School of Medicine; present data and a simple statistical network model that describe an unanticipated property of newly formed, immature neurons in the dentate gyrus.

These immature granule cell neurons are thought to increase pattern discrimination, even though they are a small proportion of the granule cells in the dentate gyrus. But it is not clear how they contribute.

This work is one small step — along with other steps taken in a multitude of labs worldwide — towards cracking the neural code, one of the great biological challenges in research. As Eric Kandel and co-authors write in Principles of Neural Science, “The ultimate goal of neural science is to understand how the flow of electrical signals through neural circuits gives rise to the mind — to how we perceive, act, think, learn and remember.”

Newly formed granule cells can take six-to-eight weeks to mature in adult mice. Researchers wondered if the immature cells had properties that made them different. More than 10 years ago, researchers found one difference — the cells showed high excitability, meaning that even small electrical pulses made the immature cells fire their own electrical spikes. Thus they were seen as “highly excitable young neurons,” as described by Alejandro Schinder and others in the field.

But this created a paradox. Under the neural coding hypothesis, high excitability should degrade the ability of the dentate gyrus — an important processing center in the brain — to perceive the small differences in input patterns that are crucial in memory, to know your spatial location or the location of your car.

“The dentate gyrus is very sensitive to pattern differences,” Overstreet-Wadiche said. “It takes an input and accentuates the differences. This is called pattern separation.”

The dentate gyrus receives input from the entorhinal cortex, a part of the brain that processes sensory and spatial input from other regions of the brain. The dentate gyrus then sends output to the hippocampus, which helps form short- and long-term memories and helps you navigate your environment.

In their mouse brain slice experiments, Overstreet-Wadiche and colleagues did not directly stimulate the immature granule cells. They instead stimulated neurons of the entorhinal cortex.

“We tried to mimic a more physiological situation by stimulating the upstream neurons far away from the granule cells,” she said.

Use of this weaker and more diffuse stimulation revealed a new, previously underappreciated role for the immature dentate gyrus granule cells. Since these cells have fewer synaptic connections with the entorhinal cortex cells, as compared with mature granule cells, this lower connectivity meant that a lower signaling drive reached the immature granule cells when stimulation was applied at the entorhinal cortex.

The experiments by Overstreet-Wadiche and colleagues show that this low excitatory drive make the immature granule cells less — not more — likely to fire than mature granule cells. Less firing is known in computational neuroscience as sparse coding, which allows finer discrimination among many different patterns.

“This is potentially a way that immature granule cells can enhance pattern separation,” Overstreet-Wadiche said. “Because the immature cells have fewer synapses, they can be more selective.”

Seven years ago, paper co-author James Aimone, Ph.D., of Sandia National Laboratories, had developed a realistic network model for the immature granule cells, a model that incorporated their high intrinsic excitability. When he ran that model, the immature cells degraded, rather than improved, overall dentate gyrus pattern separation. For the current Overstreet-Wadiche paper, Aimone revised a simpler model incorporating the new findings of his colleagues. This time, the statistical network model showed a more complex result — immature granule cells with high excitability and low connectivity were able to broaden the range of input levels from the entorhinal cortex that could still create well-separated output representations.

In other words, the balance between low synaptic connectivity and high intrinsic excitability could enhance the capabilities of the network even with very few immature cells.

“The main idea is that as the cells develop, they have a different function,” Overstreet-Wadiche said. “It’s almost like they are a different neuron for a little while that is more excitable but also potentially more selective.”

The proposed role of the immature granule cells by Overstreet-Wadiche and colleagues meshes with prior experiments by other researchers who found that precise removal of immature granule cells of a rodent, using genetic manipulations, creates difficulty in distinguishing small differences in contexts of sensory cues. Thus, removal of this small number of cells degrades pattern separation.

Molly Wood, Searching for the Best Weather App Among Weather Underground, Weatherbug and More -

That work [building the prediction side of Dark Sky] evolved into longer-term forecasting, and led to a separate product called Forecast. Mr. Grossman says the weather predictions all come from computers; no meteorologists are involved. Algorithms compare predictions from various stations or weather sources with historical accuracy and spit out purely statistical predictions. “We get a lot of flak from meteorologists who say computers can’t do it and you always need a human in there to call the shots,” Mr. Grossman said. “But humans are really bad at forecasting. When it comes to weather forecasting it’s best to leave it to the computers.”

It’s a bit like “Moneyball” hits weather prediction — the idea that statistical models can, in the long run, offer better forecasts than a mix of information and gut instinct.

The only models & celebs who have been on all 4 of the main Vogue covers (American,British,French,Italian). 

*Gisele is the only Brazilian.

Arrow S5 Ratings Models

I could not be more apathetic about Arrow this season and nothing in 5x01 made me feel any different, but the math nerd in me is not going to abandon my ratings model. As always, ratings tell you IF people watch not WHY they watch or WHY they don’t watch. You can not extrapolate ratings to say “ratings are up because of the thing I like” or “ratings are down because of the thing I don’t like” with any accuracy. You can guess or speculate about the why but no one has any evidence to the why. #PLEASESTOPLYINGWITHMATH #ITHURTSMYHEART 

My CW rating theory remains as follows 

  1. Viewership gradually decline over time,
  2. Within a single season, viewership rises and falls across a predictable trend based on prior season viewership, 
  3. TV watching is cyclical and seasonal based on the counterfactual variance of what else is airing,
  4. Thus, ratings tell us very little about the popularity of any specific storylines or characters or even ship.

My Season Four Model showed that using a statistical model to predict weekly viewership was inconsistent because of counterfactual variance. However, a statistical model was useful for predicting the overall season average rating

I think it is extremely unlikely any S5 model will be as accurate as the S4 model, but here goes nothing. That was statistical lighting in a bottle. I have made two adjustments to the model for S5 with the intent of improving accuracy.

First, I did not do a statistical model to predict the season premiere rating number. The season premiere rating is the engine for the entire predictor model. That one number is used in the model to predict all the others. Using the actual 5x01 rating will (hopefully) give the model increased accuracy.

Second, my model is based on the aggregated rate of change (the % ratings went up or down) for each episode in prior seasons. As an example, the 5x02 predicted rating is calculated by averaging the rate of change from episode 1 to episode 2 in the prior four seasons. Accurately predicting the crossover rating was an epic fail last season cause the Season 1 & 2 ratings for episode 8 can not be compared to the Season 3 ratings for episode 8. I will run 2 predictor models for S5, one that uses the aggregated rate of change for all 4 prior seasons (”The Oliver”) and one that mirrors the other model for other episodes, but for episode 8 uses the rate of change ONLY for Seasons 3 & 4 (”The Felicity”). This means episodes 2-7 will be identical in both models. 

Third, I’m just going to publish the full models below and you can refer to them for the rest of the season to save myself some headache. Feel free to mock or praise me at your leisure.

I also noted last year how Season 5 has historically been a point of high viewership drop-off with other CW dramas averaging about a 16% decline from Season 4 to Season 5.  If Arrow follows this trend the season viewership could drop 16% from S4 to S5 and that would be both normal and expected.  ( My best estimate is that;

a decline of about 13%-19% in the S5 average would be within the range of typical for a CW S5 dramaa decline of 12% or less in the S5 average would be performing better than a typical CW S5 drama,a decline of 20% or higher S5 average would be performing worse than a typical CW S5 drama.

“The Oliver” predicts a S5 average of 1.80 for a 27.7% decline from S4. “The Felicity” predicts a S5 average of 1.97% for a decline of 20.9%. The low 5x01 rating results in both statistical models predicting a worse than typical season average for a CW S5 drama. However, “The Felicity” at 20.9% is a lot closer statistically to an on par average rating (13% to 19% range) than “The Oliver”.

I dig that some of you are rooting for “The Oliver Model” and some of you are rooting for “The Felicity Model.” I now leave you to your regularly scheduled programming of attempting to use my ratings models to prove you “won” arguments you probably didn’t.

Originally posted by oliverqueenstory