Refocusing Efforts for the rest of the Quarter

So, due to the expense of operating the PostgreSQL dataset on Amazon, and the fact that a lot of the users at the conference still found it inaccessible, John and I decided to remove the PostgreSQL dataset. Instead, I will spend the rest of the quarter creating a set of mini-datasets about term frequencies that could be of use to scholars off the shelf and could be loaded into R or other statistical programs. While this will not allow others to do text processing on the full cluster, we hope to make the .gov data a possible resource as a variable that oculd be paired with other datasets (such as data on campaign finance, etc.). Because I will make all the scripts used to construct these datasets public, our hope is that if people were interested, they would have a strong starting point for altering the scripts to serve their own purposes. I am including paragraph long descriptions before each chunk of code to discuss possible other ways to do each bit, with the hopes of making it as accessible as possible.