Friday 4 March 2011

More ripples in the Twitter API clampdown

Interesting email from 140kit team, not least because I didn't realise TwapperKeeper - where you can archive your own and, export and download tweets - was affected.
However, there are still good people out there; 140kit has come up with a workaround that satisfies new Twitter guidelines, and helps non-coders access once-freely available data:
"...we plan on re-structuring this system to a point where it is trivial to download a scratch copy of our service, test one’s own analytics locally, then send the analytical process to the site for vetting, which would be a simple process. If the language you work with isn’t included in our system yet, we’ll add it. If you don’t know how to code, tell us the general algorithm and we’ll code it if we have the time and resources."

The email below explains it in more detail but I was particularly struck by the last few pars on why 140kit was established:

"[we] realized that if we generalized the process of data collection and analysis, we could open the door to doing very meaningful comparative analysis of datasets, which in turn could help us actually figure out A. If Twitter matters, B. If it does, what its impacts are, and C. What this implies for the internet and social networks as a whole. We have never been in this for money - we have never looked for funding, this has never been our job, and our systems were given to us by the Web Ecology Project and are hosted at Harvard’s Berkman Center for Internet and Society. We have one machine we pay for, which in May will be coming out of our own pockets (the machine was purchased for a year as part of a class Ian and I slapped together at Bennington College). We are solely interested in the data and its implications, and this is a labor of love. We are more than happy to continue on this project"
 
Cool people.
 
 
---------- Forwarded message ----------
From: 140Kit Team 
Date: Fri, Mar 4, 2011 at 12:29 AM
Subject: 140kit: Regarding Twitter's API Change

Hello,

You’re receiving this e-mail because you signed up for our service, 140kit, sometime in the last 8 months. We are writing you to inform you about the current state of data exports, as well as our solution to the problem currently being presented. 

A few weeks ago, Twitter caused some news by publicly stating that no more whitelisted IPs would be granted for any purposes - this essentially ends any REST based data collection for new researchers (doing collections of tweets based on User names, for instance, requires this access). Within a few days, they also sent a letter to TwapperKeeper, another major data collector, which compelled their leadership to turn off all export services as of March 20th. The same has basically happened for all other collectors, including ours. In short, the time where a researcher could export a full, unfiltered, unadulterated dataset, is completely over. 

The particular section of the TOS that is violated by export clearly states (Section I.4.a., at http://bit.ly/9LD7XQ): 

I. Access to Twitter Content

4. You will not attempt or encourage others to:

a. sell, rent, lease, sublicense, redistribute, or syndicate the Twitter API or Twitter Content to any   third party for such party to develop additional products or services without prior written approval from Twitter;

Where Twitter Content is defined as: All use of the Twitter API and content, documentation, code, and related materials made available to you on or through Twitter

Meaning that 140kit, as a service, cannot provide the datasets wholesale, where they use products/services basically to mean anything, even academic reports. For many of our users, this effectively shuts them out of the ability to research the platform. If one doesn’t know how to code, its very difficult to do this alone - this problem is compounded when you don’t have the access levels needed to research a given subject. We at 140kit have more than enough access, however, and still retain the right to keep our data, so we came up with a novel solution, which Twitter has agreed to. 

On our site, we have a library of analytical process, which in turn have their own online viewers, and a few of which contain their own exports. All of our services, from CSV export to gender analysis, runs via a modular library of analytics which have their own administrative structure. We built this system with a view that someday, we would open up our system for researchers to build out their own analytics, add them to our site, and all researchers would have access to these processes as well. We wrote our project in Ruby, but want to make this plugin system work with any language, which should actually be quite easy. 

Over the next few months, then, we plan on re-structuring this system to a point where it is trivial to download a scratch copy of our service, test one’s own analytics locally, then send the analytical process to the site for vetting, which would be a simple process. If the language you work with isn’t included in our system yet, we’ll add it. If you don’t know how to code, tell us the general algorithm and we’ll code it if we have the time and resources. 

In this way, as the library increases, we will be able to answer more of the most core questions researchers are interested in, and at a certain threshold, all the important questions will have their analysis on the site already. Since we can keep our data, we would be able to re-calculate analysis on any previous dataset. In short, we can’t give you the exports of data, but we can answer any question you want answered. It’s not the best solution, but it will save many projects from the grief of doing this alone.

This project was started in October 2009, between two people, myself (Devin Gaffney)and Ian Pearce. We were profoundly interested in analysis I was doing about the Iran Election, and realized that if we generalized the process of data collection and analysis, we could open the door to doing very meaningful comparative analysis of datasets, which in turn could help us actually figure out A. If Twitter matters, B. If it does, what its impacts are, and C. What this implies for the internet and social networks as a whole. We have never been in this for money - we have never looked for funding, this has never been our job, and our systems were given to us by the Web Ecology Project and are hosted at Harvard’s Berkman Center for Internet and Society. We have one machine we pay for, which in May will be coming out of our own pockets (the machine was purchased for a year as part of a class Ian and I slapped together at Bennington College). We are solely interested in the data and its implications, and this is a labor of love. We are more than happy to continue on this project, and are glad you have used our service. Our hope is to be more on the ball with tickets, issues, and other problems as we go through this re-structuring, and come out of this making analysis even easier for people. Thank you for reading this admittedly long e-mail - A more full description of the current situation is located on our front page currently, if you need any more details. For any other questions, feel free to personally reach out to us or contact us via this email account.



Read the full report here: http://bit.ly/ddarvF


Thanks much, 


Devin Gaffney and Ian Pearce


No comments: