A Lecture on Incompleteness and Truth

Below is a lecture I gave to the Socratic Forum for Thought on the subject of Kurt Godel’s marvelous Incompleteness Theorem and its relevance to philosophy.

I have to admit, I found this to be a hard topic since it cut a fine line between rigorous abstract logic and more loose metaphysics. Most in attendance received the talk well, perhaps I will do another in the future.

Dismantling The Cult of Confidence, New YouTube Channel

I gave a lecture this last weekend on the application of probability theory to modern epistemology. It outlines a lot of my own thoughts about the mistakes in public discourse when we talk about the  confidence and certainty. This was delivered at the Socratic Forum for Thought in Seattle Washington:

This blog hasn’t been updated recently since I haven’t had time for writing during my Grad program. However, recently I have participated in several debates and delivered some lectures on topics corresponding to this blog’s subject matter. I am starting a new YouTube channel to host these videos although everything I upload will be re-posted here.

 

Mining Healthcare Data : A Modern Rumpelstilskin Story

Via Megan McArdle: the New York times and the Washington Post are reporting on recent problems stemming from the Obama Administration’s Healthcare-data project. Apparently data analysts studying health impacts of new programs are not controlling their experimental samples. Whereas ideally the government’s analysis would be the basis for crafting intelligent policy, the New York Time’s description calls into question the robustness of the research being conducted.healthData

“The studies that are regarded as the most reliable randomly assign people or institutions to participate in a program or to go on as usual, and then compare outcomes for the two groups to see if the intervention had an effect.

Instead, the Innovation Center has so far mostly undertaken demonstration projects; about 40 of them are now underway. Those projects test an idea, like a new payment system that might encourage better medical care — with all of a study’s participants, and then rely on mathematical modeling to judge the results.”

The superficial approach described above is odd because it seemingly flies in the face of conventional approaches statistical modeling. For those not familiar, establishing a randomized control is essential to getting results that don’t just confirm the hypothesis is being tested. You can see this problem in the infamous Israeli Air Force Study(a really informative overview of this concept can be found on YouTube), and it’s been a long standing statistical understanding that, when possible, randomized control samples are always preferable.

So why do government analysts feel so confident that they can dispense with what has, until recently, been an essential feature in any statistical experiment? Well because they’ve got great data-mining technology!  Here, the word “mathematical modeling” does a lot of work in obscuring the real methods that the government is using. Mathematical modeling can really mean anything, and ironically the NYT’s link on this description is broken.

Megan McArdle, has a good take on the possible sources of the mistake: sloppy thinking on the part of federal bureaucrats. Says McArdle:

Gold’s article implies that the administration is looking at gross savings — which is to say, it’s just reporting the amount of money saved by the accountable-care organizations that ended up on the positive side of the ledger, even though this is less than half the total. Statisticians have a term for this: the Texas sharpshooter fallacy…..

Perhaps, I may be even more cynical than McArdle, but my take is somewhat different.rumpel

Given that the administration has been unable to produce evidence of healthcare savings from increased coverage, it is fair to say that the president is feeling pressure to come up with some statistical result that will make costs appear more reasonable  (at least ahead of the next CBO estimate). Moreover, without speculating too much as to the overall structure of bureaucratic management, I don’t think it is unlikely that individual analysts are also feeling the pressure to deliver “good” results, especially with all of these cool new “big data” tools so prominently featured in the news.

The result is predictable: a sort of magical thinking arises where data-mining and complex models become panacea for turning poorly conducted statistical tests into predictive models showing large savings from new “innovative” approaches to delivering healthcare. Of course the results are all confirmation bias, but who’s going to look a gift horse in the mouth? Certainly not an administration desperate for good news on the healthcare front.

Now admittedly, I have no inside information, but if this kind of sloppy analysis is indeed going on then it is certainly a cause for concern. The one-sided use of over-optimistic healthcare predictions could lead the CBO to perennial underestimate the cost of supporting programs like Medicare in their current state. This in turn could ultimately doom these program’s long-term solvency (not to mention the long term solvency of the country) since politicians are all too willing to forgo necessary reform in the light of CBO reports that tell them healthcare costs will come down on their own accord.

But ultimately this problem is not political. It stems from a cultural approach to data analysis that is far too prevalent in industry and in government. I like to think of it as a modern day Rupelstilskin story. What do we have? Reams of uncontrolled data. What Do We Want? Optimistic predictive results. With this point of view, it’s tempting to simply lock analysts in a room and ask them to build mathematical models until they finally manage to spin the straw data into golden predictive models like the miller’s daughter from the aforementioned fairy tale.

But just as in the fairytale, when we force someone to spin straw into gold, it shouldn’t be surprising when magical methods play a large role in their process. Moreover, in the case of the government’s own analysis the Rupelstilskin metaphor can be taken yet further. For in trusting their magic numbers, our current leaders may have put the next generation on the line for the results.

The Truth About Data Science

DataScience

From a recent conference on data science :

” A data scientist is a statistician who lives in San Francisco.”

“Data Science is statistics on a Mac.”

“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”

True words.

Byte Counter Byte: Human vs. Machine Judgment

The central question in the digital age may be “who owns our data?” but this could just as easily be rephrased as “who makes the decisions over how our data is used?”. So far the decisions have been increasingly made my machines. The conversation shown here illustrates that there may be some caveats to this assumption.

In a nutshell, this is more or less the disagreement as it’s seen from a data-scientist’s perspective. But I think that there are more fundamental questions for consumers. Beyond which of the two models will ultimately dominate the market, do we want machines to manage how our data is used and analyzed?

What's The Big Data?

judgementHere’s a simple rule for the second machine age we’re in now: as the amount of data goes up, the importance of human judgment should go down… The practical conclusion is that we should turn many of our decisions, predictions, diagnoses, and judgments—both the trivial and the consequential—over to the algorithms. There’s just no controversy any more about whether doing so will give us better results… I don’t know how quickly it’ll happen, but I’m very confident that data-dominated firms are going to take market share, customers, and profits away from those who are still relying too heavily on their human experts.

–Andrew McAfee, Harvard Business Review Blog

Human judgment is at the center of successful data analysis. This statement might initially seem at odds with the current Big Data frenzy and its focus on data management and machine learning methods. But while these tools provide immense value, it is important…

View original post 204 more words

Distributist Resolutions for the Digital Age

I’ve been thinking a lot about becoming more responsible for my digital property in 2014. It’s not just the scandal with the NSA. It’s realizing how much of one’s life is essentially tied up in strings of “1’s” and “0’s” stored on large corporate-owned servers. If there is one thing I’ve learned in 2013, it’s how fundamentally essential my digital information is to my personal well-being.

This general attitude has only been reinforced since I heard from a friend who lost $10,000 dollars in BitCoin when he cancelled a cloud account without including proper backup. At first it sounded outlandish to be so invested in pieces of information that essentially didn’t exist beyond their presence on a third party server. Then I thought of the copious amounts of ebooks, apps, and music I “owned” but that could be easily rescinded by the party in charge of the DRM.

So in 2013 I will be trying out some new resolutions, not just to ensure my own information is secure, but to really become part of the collective solution that will eventually be needed to solve the issue.

1.) Use Non-Proprietary and DRM-free file formats

I think one of the main ways to ensure privacy is to establish boundaries between data owned by the user and the data owned by the service. Nothing has hurt this distinction more than existence of pervasive DRM. By now everyone if familiar with horror stories of people loosing their collection of ebooks or mp3’s based on legalistic mismanagement on the part of Amazon or Apple. But beyond ridiculous worst-case scenarios, the truly destructive part of DRM is the implicit understanding it embodies that a user does not own a digital piece of media the way they own a physical copy of the same material. In order for any sort of rational concept of information ownership to emerge, DRM must go.

For myself this is a daunting task. Like most users of my generation I bought intdataownershipo the digital marketplace early and without thinking of the in infrastructure I was creating. As a result I have invested thousands in media formats that are DRM locked. Yes, I can strip it off, but this takes time and is not exactly legal. For the time being, at least I can stick to the formats that are open. No more kindle books or iTunes media.

2.) Use the Open Source Alternative

Alright, Alright. I have already written about the general futility of trying to work without proprietary software. But I am also sick and tired of companies abusing their market dominance of applications to rope people’s data into their own personal cloud systems. There is no way to survive (in the corporate world at least) without Microsoft Office; but I’ve noticed increasingly that the application is trying to move my documents from the hard drive to the cloud. Creepy, but especially creepy considering that, due to Microsoft’s dominance, the open source alternatives for word processing and spreadsheet management provide no real alternative in a modern work flow.

For the time being it’s baby steps: using Firefox instead of Google Chrome, GIMP instead of Adobe Photoshop. Though, it might be a while before I can ditch iTunes or Microsoft Office.

3.) Keep Updated With Privacy News and Networks

There are plenty of ways to keep abreast of the various updates to the status of online privacy. However, I have to confess, as much as I love talking about privacy in the abstract, I hate actually following the day-to-day news concerning which new groups have, most recently, been abusing digital privacy. Still, there is no real way to handle the issue without being informed. Not to mention, I’d be a bit of a hypocrite to complain about user apathy when I can’t be bothered to read a three page article about the new Google terms of service.

I should have my work cut out for me for the next year. I also plan to exercise regularly, sustain a low-carb diet, and lose ten pounds; but, of course, that should resolution should be relatively easy to keep.

Why I’m Praying for More Judicial Activism against Online Privacy

We all knew this was coming. Yesterday, the courts pushed back against earlier rulings on privacy and the NSA’s data-collection schemes. From the New York Times :

“A federal judge on Friday ruled that a National Security Agency program that collects enormous troves of phone records is legal, making the latest contribution to an extraordinary debate among courts and a presidential review group about how to balance security and privacy in the era of big data

In just 11 days, the two judges and the presidential panel reached the opposite of consensus on every significant question before them, including the intelligence value of the program, the privacy interests at stake and how the Constitution figures in the analysis.”

I do hope that the US Supreme Court picks this case up. Not that I’m expecting the court to rule in favor of privacy, I just want some definitive status-quo so that an honest discussion of the issue can take place. I get the sense reading the news that no one really understands what’s at stake or the relevant precedence in law for online privacy. The technology is changing fast and, consequently, no one feels like its worth developing a strong opinion. At this time, a large judicial decision might help people wake up and become involved with the issue of digital privacy.

I think this is more or less the role that Roe vs. Wade had on the issue of abortion. Before the landmark ruling, the anti-abortion movement was a disorganized coalition of church groups shell-shocked by the sexual revolution and unable to put forward any argument beyond dogma. Forty years later, with the specter of Roe vs.Wade still looming, the Pro-Life community had formed itself into a cohesive and burgeoning movement dwarfing its opposition on the national stage.

I would hope something similar might be possible for the advocates of online privacy. A setback wrought through judicial activism would be bad; but could anything be worse than the slow deterioration of privacy through apathy and public ignorance?