5 posts about meedan

Recent Meedan press

Mar 15 2010

I've had a great time working at Meedan recently as Director of Miscellany. We recently rolled out an update of the site and took of the "beta" label.

I was pleasantly surprised at the amount of interest generated — turns out people are actually pretty interested in crowdsourced translation!

We've got a lot of work ahead of us, but at least we are iterating live again now, and we have an incredibly supportive community getting our little nonprofit through the buggy spots.

Anyway, the recent press clippings:

The Economist discusses the human-machine "cyborg translation" approach in a summary piece mentioning Meedan as well as our friends at Worldwide Lexicon and Global Voices.

The London Guardian wrote a nice article:

The system, which has been in development for more than three years, is based on advanced automatic translation technology developed by IBM and uses an international team of 30 translators and editors to find news and polish the language. ... With the potential for highly localised websites that cannot even be reached by outsiders — let alone understood — many have worried about the potential for a series of so-called "splinternets" to evolve.

The Guardian also as wrote a blog post mentioning our Arabic-English Open Translation Memory. I appreciated this one because it at least mentioned our desire to move to WWL (away from a proprietary IBM backend which we have been loaned):

Meedan's data — its 'translation memory' of over 3m words — is available to other translators. Weyman says: "the translations that are done with the Transbrowser are part of our agreement with IBM that makes sure all those translations are open source." This isn't true of some other web-based translation services, which are open access but not open source data services. The 'translation memory' is important because having a corpus of texts in two languages allows you to apply statistical techniques to improve a translation engine.

Le Monde covered Meedan in French! Here's the original French and the translation. (It's actually quite readable MT for a change — French to English machine translation does not need humans nearly so much as Arabic to English.)

The Slashdot discussion of Meedan is hilariously bad yet somehow deeply insightful as a document of the Slashdot community (they are mostly terrified that Meedan will cause WWIII — hasn't happened yet, guys.) At least they properly crashed the site for a bit, truly Slashdotted! (Or maybe that's what I get for pretending to be a Postgres admin.)

Wired ran a great article and even ventured to mention the interaction design:

Meedan is not the first to make machine translation tools publicly usable — Yahoo's Babel Fish has been around for years, and Google's Translate continues to improve and broaden its scope as the company uses its massive trove of search queries to tune its translation technology. Meedan takes a different tack, first using Machine Translation technology, and then letting translators fix and refine translations. The status of a translation is always apparent, and learning a lesson from Wikipedia, Meedan makes the history of each translation publicly and quickly available. The point is to not hide the messiness of translation and keep the reminder that this is a cross-cultural endeavor embedded in the site design.

Lastly a bit about Meedan appeared on BBC Arabic, in the New Statesman, and on Al Jazeera.

'Slashtags' for citizen editors

Nov 9 2009

Updated Nov 16, 2009: @chrismessina created a wiki for the Twitter syntax http://microsyntax.pbworks.com/Slashtags

The NYT reported today on how the #fthood hashtag has failed:

Until lately, the main way to make sense of an urgent outpouring of tweets on a particular subject was to use text searches: look for the phrase 'Fort Hood,' for example, or maybe an agreed-upon label, '#fthood,' within tweets. Yet during events like the shootings on Thursday at Fort Hood that left 13 people dead, this method is useless. Hundreds of 'relevant' tweets pop up every minute, most repeating the same news reports over and over again or expressing concern from far away."Refining the Twitter Explosion" on nyt.com

I believe that there is an enormous potential to do citizen journalism better on the web, and that we need the leadership of people who are willing to help clean up the mess. Unlike some people, I do not think that the poor citizen journalism around #fthood is an indictment of citizen journalism — rather I would say it points to the absence of citizen editors.

In the Vote Report and Swift parlance, these are "Sweepers," the custodians working to clean the stream, validate claims, and generally insert some professionalism.

Taken to their logical next step, you can see the emergence of volunteer "citizen editors," who appreciate journalistic rigor and take time to bring signal to the noise in dozens of different ways.

Recently around Meedan we have been talking a lot about using Delicious and Twitter tagging to more effectively manage our content across our many networks, and to bring more meaningful conversations to our users.

This is the power of tags: they are impossible to contain in a single network.

By relying on Delicious and other social bookmarking systems, we've been able to build our editorial backchannel into numerous social platforms. Rather than being stuck with the limitations of some CMS, and have to copy everything out to our social network, we can use the social network and then bring it in to our own domains.

That's always a smart approach for nonprofits, because it builds your conversation in a meaningful, and searchable way. Metadata value (real usable value!) accumulates like interest in your bank account. And citizen editors are the people who are trying to make this system provide even more of a return, because fundamentally we want more people to care, understand and take action.

Twitter Lists Taken Seriously

So we've been looking into some of the existing pseudo-standards like the #hashtag, and looking for ways for improving our journalistic rigor. George recently posted about using the new Twitter lists features to curate groups of sources for our Iran Twitter feed:

Rather than treating our Twitter list as a gizmo, with shoddy maintenance and dubious output, what if we put some rigor into it by beginning with Journalism 101?

George, our lead editor, knows this stuff all too well:

What is the reported location of the Twitter Stream? Is the Twitter Stream using Farsi or a local language? How long has the Twitter Stream account been up and running?

(And oh yes there are many more criteria.)

I think these are the good, basic questions that may not be answered by some organizations — and their lists are thus quantifiably worse, in the sense that they are less reliable, less meaningful, and probably noiser. So we can see that by following basic journalistic standards, your attention data becomes more valuable. Garbage in, garbage out, or, more positively, the system can be improved.

For nonprofits, which typically do not have a microgram of energy to spare, these kinds of tricks can be really helpful.

#hashtags and /slashtags

A great example of this type of "attention data enhancement" is the #hastag, which clarifies the context of a short statement on twitter with a globally recognizable tagging syntax. (I'll spare us the debate around hashtags, but suffice it to say, they can be done better.)

Chris Messina, one of the biggest advocates of #hashtags and other microsyntax, has just described a few extra bits of attribution using the "slasher." (I think we could just call it a "slashtag.")

'Pointers' are short words with different intentions. A group of pointers should typically be prefixed by ONE slasher character. You can daisy-chain multiple pointer phrases together, padded on both sides with one whitespace character. There should be NO space following the slasher. Hashtags should be appended to the very end of a tweet, except when they are part of the content of the message itself and indicate some proper name or abbreviation. Normal words that would be part of the content of a tweet anyway SHOULD NOT be hashed."New microsyntax for Twitter: three pointers and the slasher"

Particularly I think using /by is a great idea to reference an article or direct quote.

Using /by gives a very specific meaning to the username that follows it. It's intuitive enough that I don't think it even needs to be explained, you can just read it:

[http://farm3.static.flickr.com/2455/4088475379_cf90c0b1e5_o.jpg]

Not beautiful, but very clear.

This is useful for when you need to be more precise — say, if you wanted to use your attention data in another application.

For us at Meedan, this is the direction we are headed, fast. We are working on developing a clear and simple standard for using tags on the delicious network. This standard will be something that our editorial team (and anyone who cares to participate) can use to route information to our hand-curated database. You don't have to leave the comfort of your own twitter client, or use any fancy tools — just the simple, clear standards that we are figuring out.

We are already making great use of social bookmarks at meedan as a editorial backchannel. For example, you can see all of Meedan's Iraq sources on delicious, from our lead editor:

http://delicious.com/gweyman/iraq_newspaper

And everything that the Meedan user unthinkingly (me) has tagged as being generically "for meedan" (using an informal tag "for_meedan").

http://delicious.com/unthinkingly/for_meedan

Because George also uses this tag, we can get a nice community of practice working together. This page shows the shared pool:

http://delicious.com/tag/for_meedan

So, as you can see, we are using underscores, which is a common tagging convention because it looks like a space. We're not so happy with this: it's simply not expressive enough.

(Even though you can do a lot with a single little shared tag like #nptech.)

A more robust tagging system, which I believe would be very compelling if it were well designed, would extend some of this syntax. The question is: how to extend the syntax without making it overwhelming?

Setting some goals

I think that any tag needs to follow a standard that meets several critiera:

1.) it should read naturally when spoken out loud (no dots, equals signs, or weird abbreviations) 2.) it should be as cross-network as possible (for now the syntax should not break on Twitter or Delicious) [1] 3.) it should rely an aliases instead of strict taxonomies (tag first, fix it later)

So what I'm talking about is extending the tag that George used to curate Iraqi newspapers, iraq_newspaper to something like this:

/newspaper/iraq

which I think has several advantages.

  1. of the tag in ways that make the taxonomy immediately clearer. Iraq is nested "inside" a type of source.
  2. It works on Twitter
  3. It works on Delicious
  4. It is still very short (adds only one character over the underscore)

On delicious, spaces are not allowed, so I have started using two slashes. So where previously I might have tagged the article with a kind of meaningless tag:

chrismessina

but now I can tag it

/by/chrismessina

Which is still a pretty meaningless tag, but is at least prefixed meaningfully to mean "this content is by this person" as per chris' helpful article above.

Also I can improve the previous technique of using the for_meedan shorthand

from

for_meedan

to

/for/meedan

Which has the benefit of being equally readable, while obeying a more general rule of syntax.

Machine tags are not what we want, we are not machines

By far the most complete standard that is being used to solve these problems is the machine tag. This tag uses a colon and an equals sign to indicate a much more specific (though not necessarily accurate) structure. The history is from the geo community, mostly for this:

geo:long=45.353452

These namespaced key value pairs are admirably used as the output of some web apps, but are quite intimidating for human input.

Common opinion seems to be that they are too "dorky" to be usable at this point, considering especially that any good taxonomy is constantly in slight flux. (Though Flickr has made great use of them to kick of custom actions in their UI).

Similarly, what might be called a "double tag" is an interesting simplification down to a context-less key value pair:

color=red

In fact this is what comprises almost all of the tags in OSM, one of the most ambitious tagging innovations on the web. (I have said before that tagging is the secret sauce that makes a crazy project like OSM work.)

Finding a balance

Replace the equals sign in that last example, and you have slashtags, which I think are much better at communicating that "color" is a parent of the "red" value:

color/red

In this way, this "slashtag" or "slasher" approach, extended a with tiny bit of folksonomic conventions, could really strike the right balance between editorial simplicity and powerful machine-readablity.

Finding better editorial tools for realtime crises

I think that a better-defined tagging approach could really help make sense of critical, breaking news.

A wiki about hurricane Ida, for example, is probably not the right way to manage news about a critical event:

[http://farm3.static.flickr.com/2712/4089097932_350f83174c.jpg Ida]

Mediawiki makes me groan just looking at it. I'd much rather help update that information by tagging links into delicious, and knowing that someone is listening on the other end. This would motivate me to learn the emergent standards, follow a loose taxonomy, and generally try to be more articulate.

If we could react in realtime to create a more sophisticated picture of the news by expressing ourselves more clearly in the tagging interaction, I think we could ultimately make great strides in improving citizen journalism (even if all the idiots keep on tweeting, which, naturally, they will.)

This is why the usability of a citizen editor tagging scheme is so critical — it needs to be flexible enough (to handle hurricanes) but maintain a low barrier to participation (to cultivate citizen editors). The tagging approach has already proven itself in many trivial domains, now we need to step it up using our journalistic standards, and our shared interest in making sense of the news, particularly crises.

We are early in this strange distributed crisis data management effort, but I think that some of the ideas proposed by Chris Messina, and the experiments of the OSM community go a really long way in this regard. Particularly the nestabilty and readability seem like great virtues of this tagging system. Overall the "slash" is a widely understood metaphor, used by all major operating systems to indicate travresing "down" or "up" a taxonomy.

I'm going to transition some of my tagging habits accordingly, and see where it ends up!

I would love to know what you think. Stop by the contact page or @unthinkingly on Twitter and let me know what you think.

[1] notice how it breaks on gnolia.com and breaks on flickr.com Although it appears that Flickr preserves the slashes in the background, just doesn't display them on output.

[2] On Twitter there is s a bit of a variation required if we are to follow existing patterns: 1.) I can omit the space, so I will, and 2.) You need to prefix a user's name with the @ sign, like /for @meedan — I think this is still quite readable, but the difference between networks might need to be cleared up. We could in fact collapse the twitter tags to /for/meedan (ie: identical to the delicious tag) but this would probably break some automation in twitter clients that are expecting the @ prefix.

Swift

May 7 2009

For the last 5 months I've been working with friends at Ushahidi and Meedan on a project nicknamed "Swift."

Our goal with Swift is to provide a crowdsourcing platform for "data triage." Imagine something like Mechanical Turk used only for tagging news, photos, microblogging and videos. There's no business model or anything like that — it's strictly Open Source Nonprofity Goodness(tm). Meedan and Ushahidi are partners in hacking it out.

As a user of Swift you can sit down at an "assembly line" of news and tag it. Swift gives you a straightforward aggregator for news (say, news about earthquakes in california) then asks you to tag all of the people, places and organizations in that firehose of data. With a little bit of effort (collecting a few rss feeds and marking up all the content) it becomes possible to put a very bright light on an emerging part of the web. You can, for example, tag violations of electoral code in an election, as we are doing with Vote Report India, which uses Ushahidi and Wordpress as a platform for grassroots reporting in the month-long Indian election.

I'm especially interested in knowing how much we can actually do with the public data that emerges in realtime during a crisis. From a journalistic perspective, it seems like there is an opportunity to understand more concretely what the hell is going on.

For Ushahidi, Swift is an extension of their exisiting SMS reporting cycle. By "listening" to the "outside" web in a more structured way, the hope is that we can provide more relevant alerts to people on the ground in a crisis.

For Meedan, Swift is a tool for a team of editors who need to produce interesting content for their digital newsroom. Because it is an aggregator, Swift serves naturally as a listening post as well as a tagging workbench. Rope in a few feeds (such as Twitter search results feed for "election" ) and then do location extraction for the Middle East with Calais on that feed, and you have a pretty cool stream of entities.

Today we had a great meeting at InSTEDD, with a crazy good crowd of people — everybody was in town for the conference at Berkeley. Thanks to everyone for their ideas and support!

Here's my presentation from today:

[http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=dashboardshort-090507011606-phpapp02&rel=0&stripped_title=swift-update-may-6]

[http://www.slideshare.net/unthinkingly]

All of the photos and links can be found on my Flickr page.

Swift seeks to publish all of the entities that concerned communities publish about crisis, both hot flash and slow burn events. The core use case is for the period immediately following a disaster or crisis, during the hours and days of confusion.

One thing that is always interesting about Swift is that it is a very unusual use case. The tragedy of a crisis creates a temporary period of great social empathy during which many "rules" of interaction design break down. This is a design opportunity. Many people are willing to match their #have to someone else's #need, but they don't have a medium for volunteering, or a network of supporters who can contextualize and respect their work. We just watch CNN and feel powerless; we would love a way to help, as an individual, from across the world. An improved marketplace of volunteerism is possible if we can design the appropriate interactions.