YouTube Channel Upgrade

Several of my YouTube videos have been seeing an enormous amount of traffic. This has occurred largely in consequence of me dipping a toe into the internet feud between contrarian blogger Sargon of Akkad and the left-side of the internet. The fight concerns his recent petition to disband “social justice classes” on American college campuses.

I haven’t typically posted YouTube videos to this site since the topics tend to be of a more terse format that doesn’t fit into longform blogging. However, you can find my response to Sargon’s petition here, and my reaction to his debate with academic Kristi Winters here, as well as my reaction her “social science” teachings here.

At any rate, since the subscribership has reached 100+, I have created a video intro for the channel, which can be viewed below.

And, of course, my channel can be found here!


The One Word Turing Test

A friend forwarded me a math riddle over the weekend :

Imagine: You and an artificial intelligence are to be subjected to a test. Each of you will anonymously submit a single English word to a (human) judge, who will try to determine which response came from a human being (as in a Turing test). Whoever is judged a human shall live; whoever is judged a machine shall be destroyed.r6zbqvs2-1402296443 (1)

What word would you choose?

The trick, I think, is to realize that you are trying to psyche out the human judge – not the computer rival. This means finding a word that is subtly evocative of a human experience that would be hard for a computer to guess.

In other words, it would have to be a word associated with a human experience that nonetheless is not widely talked about in literature, history, or other media that a highly intelligent AI could search through. If we wanted to use a Venn Diagram we could express the problem like this:


This is not an easy task since, to be effective, the word would have to be on the edge of humanity’s ability to effectively express its own nature.

Any suggestions? What do you think are the most human words in our language?

Dismantling The Cult of Confidence, New YouTube Channel

I gave a lecture this last weekend on the application of probability theory to modern epistemology. It outlines a lot of my own thoughts about the mistakes in public discourse when we talk about the  confidence and certainty. This was delivered at the Socratic Forum for Thought in Seattle Washington:

This blog hasn’t been updated recently since I haven’t had time for writing during my Grad program. However, recently I have participated in several debates and delivered some lectures on topics corresponding to this blog’s subject matter. I am starting a new YouTube channel to host these videos although everything I upload will be re-posted here.

The channel:

A previous discussion between me and Jersey Flight can be found here:

Mining Healthcare Data : A Modern Rumpelstilskin Story

Via Megan McArdle: the New York times and the Washington Post are reporting on recent problems stemming from the Obama Administration’s Healthcare-data project. Apparently data analysts studying health impacts of new programs are not controlling their experimental samples. Whereas ideally the government’s analysis would be the basis for crafting intelligent policy, the New York Time’s description calls into question the robustness of the research being conducted.healthData

“The studies that are regarded as the most reliable randomly assign people or institutions to participate in a program or to go on as usual, and then compare outcomes for the two groups to see if the intervention had an effect.

Instead, the Innovation Center has so far mostly undertaken demonstration projects; about 40 of them are now underway. Those projects test an idea, like a new payment system that might encourage better medical care — with all of a study’s participants, and then rely on mathematical modeling to judge the results.”

The superficial approach described above is odd because it seemingly flies in the face of conventional approaches statistical modeling. For those not familiar, establishing a randomized control is essential to getting results that don’t just confirm the hypothesis is being tested. You can see this problem in the infamous Israeli Air Force Study(a really informative overview of this concept can be found on YouTube), and it’s been a long standing statistical understanding that, when possible, randomized control samples are always preferable.

So why do government analysts feel so confident that they can dispense with what has, until recently, been an essential feature in any statistical experiment? Well because they’ve got great data-mining technology!  Here, the word “mathematical modeling” does a lot of work in obscuring the real methods that the government is using. Mathematical modeling can really mean anything, and ironically the NYT’s link on this description is broken.

Megan McArdle, has a good take on the possible sources of the mistake: sloppy thinking on the part of federal bureaucrats. Says McArdle:

Gold’s article implies that the administration is looking at gross savings — which is to say, it’s just reporting the amount of money saved by the accountable-care organizations that ended up on the positive side of the ledger, even though this is less than half the total. Statisticians have a term for this: the Texas sharpshooter fallacy…..

Perhaps, I may be even more cynical than McArdle, but my take is somewhat different.rumpel

Given that the administration has been unable to produce evidence of healthcare savings from increased coverage, it is fair to say that the president is feeling pressure to come up with some statistical result that will make costs appear more reasonable  (at least ahead of the next CBO estimate). Moreover, without speculating too much as to the overall structure of bureaucratic management, I don’t think it is unlikely that individual analysts are also feeling the pressure to deliver “good” results, especially with all of these cool new “big data” tools so prominently featured in the news.

The result is predictable: a sort of magical thinking arises where data-mining and complex models become panacea for turning poorly conducted statistical tests into predictive models showing large savings from new “innovative” approaches to delivering healthcare. Of course the results are all confirmation bias, but who’s going to look a gift horse in the mouth? Certainly not an administration desperate for good news on the healthcare front.

Now admittedly, I have no inside information, but if this kind of sloppy analysis is indeed going on then it is certainly a cause for concern. The one-sided use of over-optimistic healthcare predictions could lead the CBO to perennial underestimate the cost of supporting programs like Medicare in their current state. This in turn could ultimately doom these program’s long-term solvency (not to mention the long term solvency of the country) since politicians are all too willing to forgo necessary reform in the light of CBO reports that tell them healthcare costs will come down on their own accord.

But ultimately this problem is not political. It stems from a cultural approach to data analysis that is far too prevalent in industry and in government. I like to think of it as a modern day Rupelstilskin story. What do we have? Reams of uncontrolled data. What Do We Want? Optimistic predictive results. With this point of view, it’s tempting to simply lock analysts in a room and ask them to build mathematical models until they finally manage to spin the straw data into golden predictive models like the miller’s daughter from the aforementioned fairy tale.

But just as in the fairytale, when we force someone to spin straw into gold, it shouldn’t be surprising when magical methods play a large role in their process. Moreover, in the case of the government’s own analysis the Rupelstilskin metaphor can be taken yet further. For in trusting their magic numbers, our current leaders may have put the next generation on the line for the results.

The Truth About Data Science


From a recent conference on data science :

” A data scientist is a statistician who lives in San Francisco.”

“Data Science is statistics on a Mac.”

“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”

True words.

Momento Mori : Where Does Our Data Go When We Die?

When we shuffle off the mortal coil what eventually becomes of the terabytes of social data, chat records, photographs, and files that we leave behind us? I know some of the more macabre in the tech-community have already speculated how they would like to send forth their accumulated data into the great blue yonder; some more advanced users have even set up scripts that we activate on their deaths to publicize and delete certain pieces of information. But in the past few months I’ve noticed this very concern becoming more pronounced in mainstream blogs. I was even surprised to see PBS do a segment on this question earlier this year. Maybe this an indication of maturity on our part. The internet has been driven by people too young to ever grapple with death but this reflection was bound to come sooner or later.

But death on the internet raises a number of legal difficulties, especially as it applies to our digital property. For previous generations the matter was easy. There were the physical assets that for the most part could be bequeathed, taxed, and to which the state and creditors might have claim. Then there were the memories and  personal information that family and friends would take and keep alive as long as personal memory would allow. But in our new digital age this distinction is disintegrating. Personal information has become an asset and a an asset of increasing value.

The medieval dance of death.

The song of our data may persist well after death

Once again, the great question of who owns the data rears its ugly head. And sure enough there are already fights between families and social networking sites over whether the accounts of the dead should be opened or monetized (not that I’m really sure why a corporation would be interested in having a million dead followers). However, In a rare stroke of good fortune, it looks like the families have been winning this battle, at least for now. Some celebrities, like Roger Ebert, have even had their online identities posthumously managed and updated by their families as if nothing had ever happened. A strange digital afterlife to be sure.

I have always thought that the insecurity people would have when facing death would be to make sure that their most private secrets were shielded from the prying eyes of the public. But it seems that, for most part ,people want to have their data live in the public domain. Death may awaken the sentimentalist in us all. And I suppose I would have to agree with the sentiment. As troubling as it would be have one’s personal data tossed to the four winds, the alternative consequences seem far worse. Beyond the possibility of a Digital Dark Age emerging when historians of the future have essentially no way of accessing records from the generation before (who among the millennials keeps a physical diary or physical photos?), one simply needs to look no further than grieving parents of dead teens trying to get some access to their teenage children’s photos which for the most part remain recorded on private social networks.

There are now even websites dedicated to posthumous digital preservation. This is certainly an ambitious endeavor and in many ways it seems very much like the Mormon’s use of We may have before us, not so much a tool for the living, but a mechanism for creating a bridge to past generations long departed to their eternal reward.

The situation may become still stranger as the information revolution and birth rates continue to decelerate. As the decades pass, more and more of the information stored on the internet will be from previous generations, and one day the cadre of the living souls who surf the web may be dwarfed by the legions of the dead. Our children’s generation may find themselves navigating a massive digital catacomb wherein lies the accumulated knowledge of those gone before them. An internet where the dead whisper their wisdom to future generations. It’s a haunting but not altogether un-comforting thought. The tool that was born as the province of youth in our generation might in the future become the ultimate Momento Mori.

Byte Counter Byte: Human vs. Machine Judgment

The central question in the digital age may be “who owns our data?” but this could just as easily be rephrased as “who makes the decisions over how our data is used?”. So far the decisions have been increasingly made my machines. The conversation shown here illustrates that there may be some caveats to this assumption.

In a nutshell, this is more or less the disagreement as it’s seen from a data-scientist’s perspective. But I think that there are more fundamental questions for consumers. Beyond which of the two models will ultimately dominate the market, do we want machines to manage how our data is used and analyzed?

What's The Big Data?

judgementHere’s a simple rule for the second machine age we’re in now: as the amount of data goes up, the importance of human judgment should go down… The practical conclusion is that we should turn many of our decisions, predictions, diagnoses, and judgments—both the trivial and the consequential—over to the algorithms. There’s just no controversy any more about whether doing so will give us better results… I don’t know how quickly it’ll happen, but I’m very confident that data-dominated firms are going to take market share, customers, and profits away from those who are still relying too heavily on their human experts.

–Andrew McAfee, Harvard Business Review Blog

Human judgment is at the center of successful data analysis. This statement might initially seem at odds with the current Big Data frenzy and its focus on data management and machine learning methods. But while these tools provide immense value, it is important…

View original post 204 more words