2012年11月16日金曜日

Big Data Vote Review

米大統領選のBig Data評価が報道された。
Nate SilverによるBig Data州投票分析は、96%とのこと。
専門家よりも正確と言う。

大統領選
・38州はObamaかRomneyかの二択の確率が99%以上
・40州は有力候補の確率が90%以上
・5-6州は不確実。

米上院選
・予測において正確

Big Dataの資料が不明だが、可能性として、候補者のキャンペーンに参加
した人や参加しなかった人を抜き出したけでも各確率は上がる。
一般的に収集している個人情報の人種、居住地域、就業地域、職業、年収、
性別、年齢を組合わせた分析であれば、アルゴリズムと呼ぶのは難しいと
思う。
警察の一部が始めたSNSと個人情報の組合せによる分析であればアルゴリ
ズムと呼べるかもしれない。
統計モデルは、キーと分類の集合体を図式したのだろうか。

オバマの分析は、キャンペーン参加者や寄付額、支持者関係等で比較的に
分類しやすいと思う。
収集した個人情報と投票分析の関係の方が、投票予測精度よりも興味がある。

Big Data 透ける個人情報
ツールバーの情報漏えい
Abigael Evans Prove This Message


---バラク・オバマ版『マネーボール』 大統領選勝利の鍵はビッグデータの徹底活用---
三国大洋 2012年11月08日 22時10分
http://japan.zdnet.com/cio/sp_12mikunitaiyoh/35024226/

 米国の大統領選挙でオバマ大統領が再選を果たした。
 この勝利をもたらした重要な要因がオバマ陣営の「情報戦」に求められるようだ。
 オバマ陣営は徹底してビッグデータを活用したという話が、開票日翌日のTIMEの記事に載っていた。
 アップルやマイクロソフト(とくにXbox)など、すぐにでも採り上げておかないといけない話題が山積みだが、今回はこの米大統領選挙に関する「旬のネタ」を紹介したい。
選挙対策のプロが頼ろうとする「長年の勘と経験」
 『マネーボール』といえば、10年近く前に刊行されたマイケル・ルイスの原作が大ヒットし、一昨年にはブラッド・ピット主演で映画にもなったスポーツもののノンフィクション作品。零細球団の雇われGM(ゼネラルマネージャー)であるオークランド・アスレティクスのビリー・ビーンを「表の主人公」に据え、経営資源が限られるなかでチームを改革していく姿を描いた作品だ。詳しい説明をあらためてする必要もないだろう(註1)。
 この作品の「裏の主人公」ともいえるのが「サイバーメトリクス」と呼ばれるデータ活用手法--選手のパフォーマンスに関するデータ分析から、過小評価されている(ローコストで雇えるハイパフォーマンス)プレイヤーを獲得・育成したり、なによりも出塁率と本塁への帰還(得点)率を重視する戦略・戦術をあみだし、それを徹底させて、経営資源(ヒト、モノ、カネ)の面で何倍も有利な大都市のチームに挑んでいく……といったことも、この作品を目にされた方ならすでにご存じだろう。
 『マネーボール』では、データをもとに判断する側と、勘と経験で判断する側の間で葛藤が見られる。
 優れた身体能力を持ち、プロ野球選手として将来を嘱望されながら、MLBでは結果を出すことができなかったビリー・ビーンと、ハーバード大で経済学を専攻したアシスタントGMのポール・デポデスタが推すデータ野球。そして彼らに「disrupt(破壊)される側」の象徴的存在として、長年の勘と経験に頼って選手を集めようとするスカウトや、同じやり方で選手の起用を決めるアート・ハウ監督らの姿が描かれている。
 先のTIMEの記事を読んでいる最中にまっさきに思い浮かんだのが、あのベテラン・スカウトらのことだった。「どぶ板選挙」「選挙対策のプロ」などという言葉くらいは、一有権者に過ぎない自分でも耳にしたことのある言葉だが、いまではどぶ板選挙でさえデータが使われている。「一票のお願い」に上がる対象の絞り込みや優先順位付けでさえ、データ解析の結果に基づいているのだ。
 さらに、この変化のなかで、勘と経験に頼る選挙のプロの役割が薄れ、それと反比例するかたちでデータサイエンティストと称される人々の重要性が高まっている--。
 TIMEの記事の基調はそうした変化を伝えるもの、といえそうだ。

註1:マネーボール
『マネーボール』の脚本チームに、アーロン・ソーキンの名前を見つけることができる。
ソーキンといえば、フェイスブックの立ち上げ期を描いた『ソーシャル・ネットワーク』や、現在映画化に向けて準備が進んでいるとされるスティーブ・ジョブズの伝記の脚本も手がけることが決まった、当代きっての売れっ子ライターだ。
また、ビリー・ビーンの右腕として活躍したポール・デポデスタは、のちにロサンゼルス・ドジャースのGMになり、現在はニューヨーク・メッツのスカウトとなっている。


---Nate Silver’s book sales jump 850% after the NYT statistician predicts outcomes in 49 of 50 states---
National Post Staff and Reuters | Nov 8, 2012 7:02 PM ET | Last Updated: Nov 8, 2012 8:08 PM ET
http://news.nationalpost.com/2012/11/08/nate-silver-2012-election-prediction/

Nate Silver - crowned golden boy of electoral statistics - saw a huge jump in book sales this week after correctly predicting election outcomes in 49 of 50 states. (That record may yet become 50 of 50, depending on how Florida turns out.)

On Amazon.com, sales for The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t were up 850% the day after the U.S. election, according to CNNMoney. By Thursday, it was #2 on the site’s U.S. best seller list and #8 in Canada.

Silver, the statistician behind the popular FiveThirtyEight blog, find himself the poster child of what is sure to be a new data-driven approach to politics after adding to his already astonishing prediction record.

On Thursday, he was still trending on Twitter. There were also 500,000 searches for his name on Google.

Silver appeared on the Daily Show Wednesday, where Jon Stewart encouraged the understated statistician to bask in the victory more publicly. “You are so reasonable. Don’t you want to stand up and say ‘I am Nate Silver, bow down to me,’? ” Stewart said. “I am Nate Silver, lord and god of the algorithm!”

Silver declined.

The victory lap of sorts was well-deserved for a man who received widespread criticism from conservatives for giving President Barack Obama a 90.9% chance of re-election in the weeks leading up to Election Day, said Clifford Young, managing director of polling at firm Ipsos, the polling partner of Reuters.

But beyond getting the results right overall, as other pollsters did in this election cycle, Silver’s true genius is his ability to make statistical modeling accessible to a lay audience, Young said.

    Don’t you want to stand up and say ‘I am Nate Silver, bow down to me,’?

“Ultimately, what he’s done is take a lot of the mysticism out of politics. This puts a check on the traditional pundits and the state of punditry in general. It makes me wonder if we have a changing of the guard,” Young said.

It has been an impressive start for a man who does no polling himself. After graduating from the University of Chicago with a degree in economics in 2000, Silver worked as an economic consultant at an accounting firm before creating a model to predict baseball player’s future performance. He sold it to stats firm Baseball Prospectus for an undisclosed amount and then turned to politics during the 2008 primaries with a model that emphasized demographics and past polling history.

Unlike traditional pollsters, who put questions to a field of voters, Silver incorporates the averages of several polls and weights them based on factors like the past accuracy of the polling firm, the number of likely voters on Election Day and the composition of each state’s electorate. He then runs multiple simulations of the results, which results in his probability forecast.

The end result often mirrors other aggregate data that is available. Real Clear Politics and Pollster.com, for instance, also showed that Obama held an advantage in all of the swing states except North Carolina. Yet Silver’s probability simulations as well as his status as, essentially, a one-man shop, has helped burnish his image and reputation, especially in light of the performance of traditional polling firms.

Rasmussen Reports, for instance, was wrong on six of the nine swing-state polls and showed Romney winning the popular vote by one percentage point. The NBC News/Wall Street Journal/Marist College poll incorrectly predicted that Obama would win North Carolina, while the CBS/Quinnipiac University poll incorrectly showed Obama losing Colorado.

    He’s at a level about four times as high as I am

Silver’s track record in the 2008 election led Penguin Books to sign him to a two-book deal worth more than $700,000, according to a person with knowledge of the deal. The New York Times reached a license agreement with Silver to host his blog through at least the 2012 election.

At the Times, Silver has branched out from politics to include more day-to-day topics, including a post that investigated whether KFC’s Double Down Sandwich was the unhealthiest sandwich ever. But it is his electoral predictions that have paid dividends: on the day before the election, 20% of all visitors to the Times website clicked on a 538 post, according to press reports.

Silver’s status as the electoral sage has led him to be courted throughout the business and media worlds. “He’s at a level about four times as high as I am,” said Jack Bogle, founder of investment-management company the Vanguard Group.

    If Jon Stewart and Stephen Colbert are falling over themselves to talk to you, you are in the hippest/coolest/most-insidery group

The understated Silver is not especially social and definitely not schmoozy, says Colby Hall, the founding editor of Mediaite.com, which tracks media news. Like many Silver fans, Hall has been trying to cultivate a relationship with Silver with several years, with no luck.

“He relies on his track record,” Hall says. The nerdy shtick works in Silver’s favour. “If Jon Stewart and Stephen Colbert are falling over themselves to talk to you, you are in the hippest/coolest/most-insidery group,” Hall says.

Even so, Silver’s methods have been criticized by political pundits, who detected a Democratic slant in his results, leading some to come up with so-called “unskewed polls” that showed Republicans winning handily.

Politico, a hub of news for D.C. wonks, skewered him as “over-rated” and wondered if a Romney win would turn Silver into a “one-term celebrity.” The New York Times own public editor criticized Silver for betting MSNBC host Joe Scarborough that President Obama would be re-elected.


---Nobody's perfect: Nate Silver and the imperfect art of prediction (UPDATE)---
Posted by Andrew Mooney  November 8, 2012 02:56 AM
http://www.boston.com/sports/blogs/statsdriven/2012/11/nobodys_perfect_nate_silver_an.html

For one small subcommunity of America, the man who benefited the most from the country’s decisions at the polls on Tuesday was not Barack Obama - it was Nate Silver, statistician and creator of the FiveThirtyEight blog on The New York Times website.

Based on current election returns, Silver correctly predicted the outcomes of all 50 states, with the result in Florida still pending. Given his track record - he got 49 out of 50 right in 2008 - Silver appears to have ushered in a new level of credibility for statistical analysis in politics.

But if Silver has a crystal ball, its surface is still somewhat clouded; in any sort of forecasting, there are elements of uncertainty and margins of error, something Silver notes constantly in his writing.

Still, near-perfect results two elections in a row suggest that Silver's model is particularly powerful, especially considering the confused pundit-blather in the weeks preceding Election Day. Just how unlikely was it that Silver would go 50-for-50?

The best place to turn is Silver's own projections.

Based on state polling data, Silver projected the probability that either Obama or Romney would carry each state. In one sense, much of the work was already done for him; the majority of states were so polarized as to be no-brainers. According to Silver, 38 states had more than a 99 percent chance of going to either Obama or Romney, and 44 states were more than 90 percent likely to be won by one candidate over the other.

Essentially, Silver was faced with the task of calling five or six states in which some significant uncertainty remained.

Now, finding the probability that Silver would go a perfect 50-for-50 isn't as simple as multiplying all the individual probabilities for each state. That would assume that each state's polling was independent from that of all of the other states, which doesn't seem realistic, especially since the same polling companies - YouGov, PPP, etc. - factor into Silver's analysis for many different states. In fact, Silver was guilty of this error in a post he authored after the conclusion of the 2011 MLB season, when he attempted to calculate the unlikelihood of the events of the season's last day.

However, we can look elsewhere in Silver's analysis for a better answer. On his blog, Silver also provides a histogram representing the probabilities of President Obama winning specific numbers of electoral votes. He lists the odds of Obama winning exactly 332 electoral votes - which, assuming Florida goes to the president, would match Silver's 50-for-50 prediction - at just over 20 percent. This suggests that Silver was the beneficiary of quite a bit of luck himself; his chances of perfectly predicting every state were four-to-one.

But there may be a better way of evaluating Silver's predictions than a binary right-wrong analysis. After all, the large number of states that were sure things makes it difficult to determine just exactly how impressive his accomplishment was. To see just how precise Silver's projections were, it is more instructive to compare the exact percentages he predicted for each state with the actual results from Election Day. Below, I've listed these numbers along with the margin of error Silver estimated in his predictions for each state and the amount his projections differed from Tuesday's returns - the actual margin of error.

Using this methodology, Silver’s record looks a lot less clean. The actual election results in 16 states fell outside the margin of error Silver allotted himself in his projections, reducing his total to 34-for-50, or 68 percent. He was furthest off in Mississippi, which wasn't nearly as lopsided as he predicted, and West Virginia, which voted more Republican than expected. Of course, Silver was still within 2 percent on 19 states, an impressive feat in itself.

The takeaway here is that, while Silver’s work the last four years has been impressive, he is not a mysterious wizard - for example, both the Huffington Post and Princeton's Sam Wang had similarly accurate results. He is also not infallible, and he would be the first to admit it.

Forecasting is never an area where we should expect 100 percent accuracy, and though Silver's work is bringing a lot of positive attention to statistical analysis in general, it's important that people keep their expectations of its applications realistic.

UPDATE: The graph above actually understates the projected margin of error Silver allows himself by a factor of two. Here is the updated table.

http://www.boston.com/sports/blogs/statsdriven/assets_c/2012/11/silver2-thumb-607x769-87543.png


Silver did much better than I gave him credit for initially. Forty-eight out of 50 states actually fell within his margin of error, giving him a success rate of 96 percent. And assuming that his projected margin of error figures represent 95 percent confidence intervals, which it is likely they did, Silver performed just about exactly as well as he would expect to over 50 trials. Wizard, indeed.


---Big data spells death-knell for punditry---
Posted by Dan Vos
Wednesday 7 November 2012 18.50 GMT
http://www.guardian.co.uk/media-network/media-network-blog/2012/nov/07/big-data-us-election-silver

Nate Silver's successful US election predictions show the merit of data analysis and why businesses should embrace it

The results of the US presidential election are now in. Barack Obama has been re-elected, but the election has also officially ushered in a new era. An era where analysis has displaced opinions; where data scientists have pushed out the so-called experts. These results provide excellent insight that progressive organisation should take note. Data must be the key driving factor in your business.

Prior to the election, there were countless heated debates against data scientists in the political domain. The poster child of the election data scientist is Nate Silver from the New York Times' blogsite Five Thirty Eight. Unfortunately, for the pundits who decried Silver's pragmatic approach, he could not have been much more accurate. His predictions on the US presidential race were accurate. Silver predicted every state correctly; 50 out of 50; 100% right. His successful analysis doesn't stop there, Silver continued to be spot-on in his predictions for nearly every US Senate contest. Silver's success should also have a profound impact on how businesses leverage data analysis.

It provides clear evidence that the business world needs to rely on measured performance and projection over opinion and spur-of-the-moment inspirations. Organisations should reject the so-called expert's opinion, that is based on instinct and outdated metrics, as a viable path to set the direction for decision making. This new era, based on more complex data models, has the potential to permeate into many areas of life.

Of course, this data driven approach is being fuelled by advancements in technology. Our datasets have expanded faster than our ability to count them, processing capability has exploded through the use of the cloud and outlets to display data have multiplied to include tablets, phones, and many more platforms. We are at the prefect intersection of tools and capability to deliver upon this possibility. We have the processing power and capability to deliver on it here and now.

There is another driving force that is just starting to gain momentum in the business space. Organisations are becoming aware that seemingly unpredictable events can be broken down and presented in a meaningful ways to project outcomes and therefore impact decisions. As this concept gains more momentum, businesses of all types will need to find new ways to embrace predictive analytics. Organisations will need the tools and talents to identify patterns of behaviour that, left unnoticed, could prove to be fatal to the health of the company. The stage is set for future success: businesses that know v businesses that guess.

Businesses, like elections, have for too long relied on an expert's voice to guide decisions and predict how the winds would shift in their favour. There was an allowance for expert driven decisions because every business operated this way. That meant every business could tolerate a certain number of poor decisions and continue to survive. But as big data takes hold and decisions are predicted based on information, the number of poor choices a business can make starts to drop. The analysis and subsequent decisions that used to be acceptable is going to lead to increasingly poorer performance relative to their competitors. Businesses that guess cannot survive in this analytical marketplace.

Pundits made a huge mistake in this election. So where did they go wrong? They went after the pollsters, the analysts and the people that lay out quantitative projections based on models, not opinions. However, the pundits can't escape their accusations this time because the models were undeniably correct. This same pattern is going to emerge in the organisation where the established experts will hotly contest new ideas that go against their intuition. Opinions will vary but at the end of the day, there will be a clear winner, the people that understand the data.

We need to expand the world of analysis to drive our decision making by the insights found in statistical models and the predictions of likelihood of success.


---Inside the Secret World of the Data Crunchers Who Helped Obama Win---
By Michael SchererNov. 07, 2012
http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

"The cave" at President Obama's campaign headquarters in Chicago

In late spring, the backroom number crunchers who powered Barack Obama’s campaign to victory noticed that George Clooney had an almost gravitational tug on West Coast females ages 40 to 49. The women were far and away the single demographic group most likely to hand over cash, for a chance to dine in Hollywood with Clooney - and Obama.

So as they did with all the other data collected, stored and analyzed in the two-year drive for re-election, Obama’s top campaign aides decided to put this insight to use. They sought out an East Coast celebrity who had similar appeal among the same demographic, aiming to replicate the millions of dollars produced by the Clooney contest. “We were blessed with an overflowing menu of options, but we chose Sarah Jessica Parker,” explains a senior campaign adviser. And so the next Dinner with Barack contest was born: a chance to eat at Parker’s West Village brownstone.

For the general public, there was no way to know that the idea for the Parker contest had come from a data-mining discovery about some supporters: affection for contests, small dinners and celebrity. But from the beginning, campaign manager Jim Messina had promised a totally different, metric-driven kind of campaign in which politics was the goal but political instincts might not be the means. “We are going to measure every single thing in this campaign,” he said after taking the job. He hired an analytics department five times as large as that of the 2008 operation, with an official “chief scientist” for the Chicago headquarters named Rayid Ghani, who in a previous life crunched huge data sets to, among other things, maximize the efficiency of supermarket sales promotions.

Exactly what that team of dozens of data crunchers was doing, however, was a closely held secret. “They are our nuclear codes,” campaign spokesman Ben LaBolt would say when asked about the efforts. Around the office, data-mining experiments were given mysterious code names such as Narwhal and Dreamcatcher. The team even worked at a remove from the rest of the campaign staff, setting up shop in a windowless room at the north end of the vast headquarters office. The “scientists” created regular briefings on their work for the President and top aides in the White House’s Roosevelt Room, but public details were in short supply as the campaign guarded what it believed to be its biggest institutional advantage over Mitt Romney’s campaign: its data.

On Nov. 4, a group of senior campaign advisers agreed to describe their cutting-edge efforts with TIME on the condition that they not be named and that the information not be published until after the winner was declared. What they revealed as they pulled back the curtain was a massive data effort that helped Obama raise $1 billion, remade the process of targeting TV ads and created detailed models of swing-state voters that could be used to increase the effectiveness of everything from phone calls and door knocks to direct mailings and social media.

How to Raise $1 Billion
For all the praise Obama’s team won in 2008 for its high-tech wizardry, its success masked a huge weakness: too many databases. Back then, volunteers making phone calls through the Obama website were working off lists that differed from the lists used by callers in the campaign office. Get-out-the-vote lists were never reconciled with fundraising lists. It was like the FBI and the CIA before 9/11: the two camps never shared data. “We analyzed very early that the problem in Democratic politics was you had databases all over the place,” said one of the officials. “None of them talked to each other.” So over the first 18 months, the campaign started over, creating a single massive system that could merge the information collected from pollsters, fundraisers, field workers and consumer databases as well as social-media and mobile contacts with the main Democratic voter files in the swing states.

The new megafile didn’t just tell the campaign how to find voters and get their attention; it also allowed the number crunchers to run tests predicting which types of people would be persuaded by certain kinds of appeals. Call lists in field offices, for instance, didn’t just list names and numbers; they also ranked names in order of their persuadability, with the campaign’s most important priorities first. About 75% of the determining factors were basics like age, sex, race, neighborhood and voting record. Consumer data about voters helped round out the picture. “We could [predict] people who were going to give online. We could model people who were going to give through mail. We could model volunteers,” said one of the senior advisers about the predictive profiles built by the data. “In the end, modeling became something way bigger for us in ’12 than in ’08 because it made our time more efficient.”

Early on, for example, the campaign discovered that people who had unsubscribed from the 2008 campaign e-mail lists were top targets, among the easiest to pull back into the fold with some personal attention. The strategists fashioned tests for specific demographic groups, trying out message scripts that they could then apply. They tested how much better a call from a local volunteer would do than a call from a volunteer from a non-swing state like California. As Messina had promised, assumptions were rarely left in place without numbers to back them up.

0 コメント: