Incubator Blog .gov Data Processing

Remove Postgresql Dataset

Refocusing Efforts for the rest of the Quarter

So, due to the expense of operating the PostgreSQL dataset on Amazon, and the fact that a lot of the users at the conference still found it inaccessible, John and I decided to remove the PostgreSQL dataset. Instead, I will spend the rest of the quarter creating a set of mini-datasets about term frequencies that could be of use to scholars off the shelf and could be loaded into R or other statistical programs. While this will not allow others to do text processing on the full cluster, we hope to make the .gov data a possible resource as a variable that oculd be paired with other datasets (such as data on campaign finance, etc.). Because I will make all the scripts used to construct these datasets public, our hope is that if people were interested, they would have a strong starting point for altering the scripts to serve their own purposes. I am including paragraph long descriptions before each chunk of code to discuss possible other ways to do each bit, with the hopes of making it as accessible as possible.