About 140kit

DGaff at 2012-03-26 00:36:07 UTC

Basic Background

140kit is a project that has been around in some form or implementation since October 2009. Ian Pearce and Devin Gaffney, along with additional help during Bennington’s Spring 2010 term from Max Nanis, Max Darham, and the fine folks at the Web Ecology Project, have been working on this labor of love for… forever. When Gaffney was working on his senior thesis on the Iran Election and the impact of social media, he realized that the analytical processes, Tweet collection systems, and printouts could be written in a much less tailored way – by adding a management layer on top of it, you could effectively provide a way for researchers to research Twitter.

Basically, if you’re going to research Twitter on your own, you need to make sure you meet the following requirements:
1. You have a good set of code to analyze with.
2. This code suffers from no bugs, and consistently returns all the possible data of interest, and
3. It’s constantly connected to the internet.

While theoretically possible, in practice, this is a high bar. When you don’t have these things, you can quickly find yourself left out of researching Twitter, when, in all likelihood, you would be able to come up with very useful results. We decided to create this then in order to enable you to research without worrying about any of these issues. You give us the job, and we take care of it for you, free of charge.

The Catch

There is one catch, however – Twitter doesn’t allow us to give you raw data. What we can do, however, is run any analytical process you would ever want, and we can hold on to the data for as long as you want. When new analytical processes are created, you can run them on your existing sets of data. We do not claim any control of the analysis – you can use it without worrying about us scooping your research. We do, however, ‘own’ the data collected. We give you the analytics, but we can’t give you the raw data. But who wants to deal with that anyways?

Limits

We would love to serve everyone in full all of the time without exception. In practice however, we’re a very small outfit. For this reason, until we see we can do it with no problem, we’re limiting any given collection of data in the following way. User’s have different “roles” on the site, meaning they have different access levels. When you sign up for an account, you’re a User by default. You have to get in touch with us about changing your status – this can be done by pinging us at Twitter or e-mail. This is the current breakdown:

These are soft settings – it will stop when it gets around those numbers, not exactly those numbers. They will always be above 20,000 or 500,000, however.
What this means for you is if you create a dataset that does something like a week long pull of all geocoded tweets, or a week long pull of all tweets that use the word “lol”, at some point, we really can’t collect all of it, and chances are very, very, high that you’re not going to want that type of data anyways. If you want random large sets, let us know and we’ll incorporate a random constant feed.

Other than that, game on. Oh, and feel free to tell us how were doing via Twitter or e-mail.