A new generation of nuclear power opponents?

Nuclear power is as close to magic as we've come when it comes to economically viable energy production, but it's also controversial. A lot of the opposition is from the older generation who remember back to events like Chernobyl and Three Mile Island.

Will the ongoing nuclear scare in Japan create a new generation of people opposed to nuclear power?

We're witnessing the cost of a nuclear meltdown right now. Hopefully this is the worst things will get, but it's already scary enough for me.

People don't place bets if losing the bet carries too high a cost. And any engineering system is a bet. The output of a nuclear power plant isn't actually Electricity + Waste as we typically think of it, the output is coarsely:

- Success with probability p
- Failure with probability (1-p)

Presumably, success has some positive value to society, Failure has some negative value, and p is very close to 1.

How do we, as a society, evaluate the output of the plant? As the expected return, Success*p + Failure*(1-p)? This is how the auto industry works, for example: they design with a tolerable expected deaths per vehicle-mile (like 0.0000000something).

The problem is expected return doesn't work so well if failure is catastrophic. The reason is that whatever mathematical model we use estimate a value for p is inherently flawed, as all models are. Models don't reflect the world, they are just rough approximations of it.

So the real probabilities are

- Success with probability p-epsilon
- Failure with probability (1-p+epsilon)

Where epsilon is some probability mass that represents the un-modeled failure situations -- Donald Rumsfeld's "unknown unknowns". The problem with this term is that we don't know how big it is. If we have catastrophic failure with probability (1-p+epsilon) and we don't know how big epsilon is, that's pretty scary.

The failure in Japan right now is happening somewhere in the probability mass covered by this epsilon term:

When Fukushima was built, TEPCO rated and tested reactors 1 & 2 to tremors equivalent of a 7.9 earthquake -- the highest they thought was possible for the region. No need to test for an 8.9er -- not factored into their risk model.

An 8.9 quake is the strongest in recorded history for Japan -- 10 times worse than the 7.9 Fukushima was rated for (logarithmic scale). And it wasn't just the quake that caused the reactor problems. Engineers spend a lot of time planning for shit to hit the fan. And when you look at catastrophic failures -- the (1-p+epsilon) scenarios -- it's often a perfect storm of cascading failures that cause all of the checks and balances to fail.

In the case of this reactor, it was:
- Earthquake hits, shuts down nuclear reactor, power goes out
- Tsunami floods, takes out diesel backup generators that keep cooling system running
- Backup batteries run out because the above national infrastructure is too torn up to replace them
- Plant infrastructure (pipes, outer containers) crumble and burn, further damaging cooling system

That's pretty complicated. And low probability.

The worst failure condition for a nuclear reactor doesn't just kill people, it curses the earth for kilometers and decades. And that's the scary thing: that there are unknown unknowns not factored into the risk assessments of engineering systems with potentially catastrophic results. 

I think a whole new generation of people will grow up thinking that any power solution that places a nonzero -- and worse, inherently unknowable -- probability mass on  nuclear fallout is unacceptable. 

There's an easy solution to this, of course: don't build systems whose cost of failure is catastrophic. Will that be the policy mandate for years to come, or will we give in to the economic pressures created by scarce energy?

Join us for a Crowdsourcing Community Service Hackathon next weekend

Calling all hackers, designers, and journalists in Boston!

A friend and I are organizing a "Crowdsourcing for Community Service" hackathon at MIT Friday-Sunday January 14-16. It should be a fun time and we would love for you to come out and join us. 

All the details, and a signup form, can be found on the website

We're trying to get a group of creative people together for a weekend to try to solve a community service problem using mobile and crowdsourcing technology. 

How often do you think: "If only we could each pitch in a small bit, the world would be a much better place." Well, let's get together, brainstorm a bit (Friday), create a tool or platform that helps us each pitch in that little bit (Sat & Sun), and then use it to better the Cambridge and Boston communities!

We're interested in people of all skills that could contribute to a project like this, whether you're a hardcore web hacker, a graphic designer, a writer, or a community organizer. 

Hope to see you there,

Ted

The Semantic Web needs a MySQL

One thing was clear in the comments of many industry-facing participants of ISWC 2010: a big impediment to adoption of semantic web technologies is the lack of an off-the-shelf triplestore that "just works." 

There are many other problems, of course: RDF an awkward format when it comes to real world programming because the graph model doesn't align to the object-dictionary model of OO programming; JavaScript favors JSON instead of RDF; URIs and namespaces can be a burden to craft the first time around. But these problems can be lessened, or eradicated, with good development frameworks. 

Underlying these surface problems is a deployment one: even if a company wanted to, there's no clear hassle-free solution to getting a triplestore up and running with the same ease, access, and reliability that relational solutions such as MySQL and Postgres provide. And as long as this is the case, otherwise semantic-web savvy individuals are going to continue to live in the relational world. When people are spread thin, and want to focus on user experience instead of database administration, they'll pick the database product that allows them to focus on other things. 

So what gives? Do we wait for a Mike Stonebraker of the triplestore world to come around? Or do we try to bolt our technologies onto non-relational databases with gaining momentum such as MongoDB or CouchDB? 

The Toothpaste Problem & Choosing the “right” data to publish

People who visit a toothpaste isle with only 4 products walk away much happier than those who visit the typical supermarket isle crammed with 40 variants of Colgate. Why? Because they don’t get overwhelmed by a tsunami of possibilities that leaves them wondering if they made the wrong choice.

When it comes to a large organization publishing data, perhaps a similar problem arises. Given all the information in the world that we could publish in structured form, how are we to know which important bits to address first?

Hans-Jörg Happel proposed an interesting way to solve this problem in the Social Semantic Web track at ISWC 2010 today. If we can quantify the need for a particular morsel of information, we can prioritize our efforts to structure and publish data. The question, then, becomes how to quantify information need.

Happel’s idea is to do this by examining missing values from query results. When someone performs a query, they’re stating that they need a particular data set. When one of the items in the query result is empty (such as missing 2010 GDP value for Mexico), that’s a known piece of information that someone needed and didn’t get. If we count up the number of times each of these NULL values occurs, we can begin to keep a priority queue of desired, but missing, data.

So if Mexico’s 2010 GDP is missing from WikiPedia, is that a problem? Well, count up the number queries that returned a NULL for this item and judge quantitatively. If the number is comparatively high, maybe we should prioritize the addition of Mexican economic stats.

He’s created a plugin for Semantic MediaWiki, called Semantic Need, which does exactly this. The list of prioritized information is called the “Extended Knowledge Base” — those things that we want to know, but don’t. As a programmer, I find this project very clever. Developers usually think of NULL values in query results as mere annoyances. But this work turns that around and makes them useful.

One of the themes of the Haystack Group is that focusing on user needs can direct research toward results that are immediately useful. On the semantic web, picking an explicit user goal (helping users communicate effectively using data) can be more effective than picking an abstract goal (building a web of linked data). Our project DataPress attempts to follow this philosophy by helping users add interesting visualizations to their blogs, and as a side effect, showing those users the value of structuring their data. Semantic Need follows this philosophy in another way: it attempts to quantify an existing, realized need for pieces of data so that we know which data is actually useful for structuring right now.

While the presentaiton didn’t address it, the idea behind this talk could be incredibly useful for government data. What if governments provided not links to data sets (as data.gov does) but rather some ontology and a query interface. Then it sits back and sees what users query for. Using an approach like this, the “what data should we publish” problem solves itself: the queries people ask will tell you what data to prioritize for publishing.

Here’s a link to the paper: Semantic Need: Guiding metadata annotations by questions people #ASK

If James Bond were a Linguist

MIT is hosting the 2010 Empirical Methods in Natural Langauge Processing conference this year, and I noticed a clustering of papers in the program that would make for a fun session, possibly titled "Linguistic Security". The session would cover both offense (what can we tell about you from the language you use) and defense (how can you hide messages in your word choice).

Since the EMNLP program is already fixed, the "2010 Ted's Blog Workshop on Linguistic Security (TBWLS)", will have to suffice :).

Here's the program. Each item is taken from the real program:
  • Keynote: Why do we call it decoding?
    Kevin Knight
  • Improving Gender Classification of Blog Authors
    Arjun Mukherjee and Bing Liu
  • Modeling Perspective using Adaptor Grammars
    Eric Hardisty, Jordan Boyd-Graber and Philip Resnik
  • Practical Linguistic Steganography using Contextual Synonym Substitution and Vertex Colour Coding
    Ching-Yun Chang and Stephen Clark
  • A Latent Variable Model for Geographic Lexical Variation
    Jacob Eisenstein, Brendan O'Connor, Noah A. Smith and Eric P. Xing

Bit by Flexibility: Implicit Conversions to Java with Scala 2.8

Scala 2.8 includes a library that helps implicitly convert Scala objects to Java objects so you can keep your data in Scala-land while still using Java API calls. Just import this package in your code:

The problem is sometimes the conversion library fails at compile time because there are just too many possible conversions it can make. It can't decide between all the possibilities. Talk about being a victim of your own success!

Here's an example: I have a scala.Iterable of items, and I want to implicitly convert it to a java.lang.Iterable

But the implicit conversion dies here with the following message:

So here's the fix: you can wrap your data to indicate the particular conversion you would like to occur. A list of wrappers is here. In my case, I want a java.lang.Iterable, so I'll wrap it as so:

This removes the ambiguity, allowing the compiler to proceed without baffling itself by its own cleverness.

The right to advertise adult services?

I don't have a deep understanding of the law, but I think the recent closing of the "Adult Services" section on Craigslist is a fascinating moment to reflect how complicated and confusing regulating sin is. 

Here's the situation, as described by Matt Zimmerman of the Electronic Frontier Foundation, in case you don't read geek news:

On Saturday, after years of pressure from law enforcement officials, Internet classified ad web site Craigslist bowed to demands to remove its "Adult Services" section which critics charged encouraged prostitution and other sex-related crimes.

At first glance, it seems like this might be pretty cut and dry:
  • X is illegal
  • Y has an advertising section for X
  • Y is therefore an accomplice to acts of X
  • So Y should be punished
But Craigslist isn't advertising sex -- its users are -- and this turns out to be an important difference. Important because this situation represents a sort of edge-case between two separate goals we have as a society:
  • We want to make it illegal to do, or be an accomplice to, certain acts  
  • We also want to protect telecom carriers from being liable for the messages they carry  
The 1996 Communications Decency Act puts this second goal into law. It protects people who serve as a carrier of information from having liability for the third-party information they carry. An analogy in the physical world would be that the Department of Transportation isn't responsible for how people choose to use their roads. If you drink and drive, it's your responsibility, and the DOT isn't an accomplice just because it "carried" your car. In the same way, providers of "online roads" (like Comcast, Google, and Craigslist) are not liable for the particular 0s and 1s that individuals choose to put into their systems. This is a critical protection for the internet to be a viable business platform. If it weren't there, your ISP would be an accomplice if you planned a bank robbery over instant messenger.

But when Craigslist hosts a classified forum titled "Adult Services," it is pretty clear what the intent of that forum is. They're not just asking for any third-party messages, they're asking for a particular type of third party messages, in this case one that tends to be illegal in most places. 

Does the debate actually come down to semantics? Is it a crime to host a site titled "Post illegal prostitution ads here", while a site with the innuendo-laden, yet ultimately nonspecific title "Adult Services" would be protected under the Communications Decency Act? What about forums where drug users hang out? Or a forum where people discuss ways to speed without getting a ticket? It gets grayer and grayer pretty quick, which is why this is an important issue to stop for a moment and think about. 

My gut reaction on this issue is, "Well, yeah. If the act is illegal then of course they shouldn't be allowed to host advertisements for it." 

But then I am reminded that the junctures at which it is most important to stand our ground on freedom of speech tend to be exactly those situations where it might not be comfortable to do so. Because if we start putting footnotes on *which* types of communication carriers are allowed to carry, then we've removed their protections entirely. If Google has to filter one message, then they have to filter them all. And that removes the very foundation of freedom of speech online.