How I learned to stop worrying and love the LLMs
(by processing campaign finance data with Claude’s help)
I finally found an LLM to be pretty helpful for something. I haven’t spent a huge amount of time searching for cases where LLMs are particularly useful to me, but I have no desire to send people emails under my name that were written by an LLM.
One thing I’ve been missing recently is regular reports on campaign contributions by big tech companies and their executives. For a while, Maciej Ceglowski was keeping track of big tech PAC contributions on the Pinboard Twitter account, but Twitter is now a dumpster fire and it’s not clear whether Maciej is still tracking this (on Twitter or elsewhere). Also, beyond contributions from the top-level corporate PACs, there are interesting contributions from the senior execs at some of these companies.
The FEC has a nice website with a reasonable tool to search for contributions, both from PACs and from individuals, but it’s not great for systematic analysis. There is an API, so writing a bit of code to query contributions and process them seemed possible. However, given the lack of nice wrappers for the FEC API, the API’s hard-to-follow documentation,
Food for Claude’s thoughts
I had a couple of fuzzy ideas for things that would be useful to do with FEC data:
-
Alerts on interesting contributions - Every once in a while I look at contributions from big tech PACs or execs and find that since I last checked there were a slew of interesting ones. It would be nice to have notifications when contribution records for PACs or execs I’m interested in are made available.
-
Clustering of contributions - Sometimes a number of executives from a company donate to a single politician or committee within a short period of time. These clusters of contributions might indicate that the execs are sending a statement on behalf of their employer rather than make a contribution based on their own political preferences.
In my limited experience playing around with the free versions of ChatGPT, Gemini, and Claude, I’ve found the output of Claude to be the most coherent and least hallucinatory. My plan then was to just ask Claude to write code for me to do these things, and then tie up whatever loose ends I needed to manually.
Contribution alerting
I started by opening up Claude and providing the following prompt, with a bit of example individual contributor API output attached:
You are an expert applications developer writing tools to scrape campaign
finance information and send alerts based on that information. Please write a
Google Cloud Run Function that gets data from the FEC's individual contributor
endpoint for an inputted set of contributors and then sends an email with
contribution amounts for the inputted contributors.
Here is an example call to the endpoint, with results in the attached file:
https://api.open.fec.gov/v1/schedules/schedule_a/?api_key=aaaaaaaaaabbbbbbbbbb1111111111222222222&sort_hide_null=false&sort_nulls_last=false&contributor_name=Sundar+pichai&min_date=01%2F01%2F2008&max_date=12%2F31%2F2024&sort=-contribution_receipt_date&per_page=5&is_individual=true
Sure enough, Claude spat out a couple hundred lines of python that queried the API for a set of contributors and then emailed a nicely-formatted version of the results to a specified email address. The code also called GCP APIs to get secrets and properly decorated a function to be called by Cloud Run. I asked Claude to modify the code to take in a load date and only return contributions after that date, and it did that without any issues. I also asked it to explain how the email notifications worked, and it walked through the code just fine, helpfully noting that I would need to get an app password for the gmail account I wanted to use to send the notifications.
Not everything was completely smooth sailing, though. There were a few minor issues:
-
I downloaded the generated code from Claude and it created a file called
fec-monitor.py
, but from what I can tell GCP wants the main file for a Cloud Run Function to be named main.py (at least by default). -
I asked Claude to generate me a requirements.txt file for the code, but the generated requirements list was incomplete.
-
When I asked Claude to help me test the function locally, it generated a script to call the main function in the code it had generated earlier, but Cloud Run has its own tools to test locally. So Claude probably would have been more useful walking me through how to use those.
-
Claude split configuration for the Cloud Run function between environment variables stored in a
.env
file and secrets stored in GCP’s Secret Manager tool. But dealing with Secret Manager (and in particular managing permissions for access) is annoying compared to just putting the secrets in the.env
file and then adding.env
to the.gitignore
list.
Indeed, the Secret Manager annoyances pointed to the general pattern I encountered while setting up the contribution alerts: Claude produced code that worked well with only a few straightforward tweaks needed, but getting things properly configured in GCP was a pain. Secret Manager, Cloud Run Functions, and Cloud Scheduler (to run the function regularly) each required their own setup and had various little issues that I needed to figure out. Claude couldn’t easily automate that.
I did eventually manage to figure out the GCP configuration, and got myself a nice email with recent contributions from Google execs:
You can find the contribution alerting code on Github.
Contribution clustering
Once I had the email alerting working, I turned to the second problem I thought Claude could help with. One thing I’ve found myself doing a few times is trying to find groups of tech execs who all contributed to a specific candidate in a short time window, especially if some of those execs usually contribute to candidates of the other party.
I again started with a prompt giving Claude some example API usage and asking it to write code:
You are an expert applications developer writing tools to scrape campaign
finance information and find clusters of contributions to a politician or
committee across a set of contributors. Please write a python script that gets
data from the FEC's individual contributor endpoint for an inputted set of
contributors (specified by name and employer) and then finds contributions from
those contributors to the same committee or politician that were made within an
specific amount of time from each other.
Here is an example call to the endpoint, with results in the attached file:
https://api.open.fec.gov/v1/schedules/schedule_a/?api_key=9zoqdVyI6ZY8wyhtP2LRvl5exEmKYoPXDnGfaPg3&sort_hide_null=false&sort_nulls_last=false&contributor_name=Sundar+pichai&min_date=01%2F01%2F2008&max_date=12%2F31%2F2024&sort=-contribution_receipt_date&per_page=5&is_individual=true
The output was again a few hundred lines of Python that seemed like it should query the API and cluster the results by contribution date and recipient. The code provided by Claude ran successfully without any modifications, but it was pretty clear from the output that it was duplicating contributions:
Rather than trying to debug the code (particularly since it’s been a while since I last tangled with Pandas dataframes, which Claude’s code was relying on), I asked Claude to fix it:
This code is returning duplicate contributions in clusters. I also don't want
clusters with only one contributor.
Claude handed me back a modified code file that seemed to work:
You can find the contribution clustering code on Github.
Future work
It’s worth noting that I haven’t (yet) asked Claude to write tests for the code it has given me. This is in part because I find it a little weird to have an LLM both write code and then write its own tests for that code. Am I really going to be comfortable understanding what the tests are verifying? I can probably spend the time to understand generated tests, and they would also be useful to catch breaking changes I might make in the future to the code, so I’ll probably end up asking Claude for tests anyway. But for now, I’m going to be confirming any information these tools provide with a direct lookup in the FEC’s search tool, so I’ll be doing my own verification anyways.
I’m also likely to make tweaks to the code for quality of life improvements (either myself or with Claude). For the email code, it might be useful to link to the contribution entries on the FEC website, for example, and I’d probably like the cluster not to return contributions to company PACs. LLMs also seem to be pretty good at putting together quick web apps for tools where a web app might not otherwise be worth it, so trying to put together one of those might be interesting.
Can Claude replace my intern self?
Simon Willison has compared using an LLM to working with a “weird intern” (and others have made similar comparisons). The comparison felt particularly apt to me for this project, as one thing I spent time doing as an intern at the New York Times was working with campaign finance data. So while the LLMs might not be able to replace me yet, it seems like maybe they could replace a slightly less-experienced version of me?
But I would have some qualms about swapping out intern-me for Claude. The tasks I put Claude to work on here are probably among the more straightforward campaign finance processing tasks, and are types of tasks (e.g. calling an API, sending an email) that LLMs have many examples to draw on. Even there, the output required a little massaging and (for the clusterer) had some obvious inefficiencies. For my intern campaign finance project, I ended up using webdriver to scrape information from an old state campaign finance site built on ASP.NET
It’s also worth noting that one of the main benefits of having intern-me play with campaign finance data was that I actually got to learn a bit about how campaign contributions and campaign finance data works. This allowed me to do things like help a reporter dig through contribution data during my internship. That experience was also probably a good part of the reason why I’ve kept poking around at campaign finance data occasionally and why, much later on, I had the idea for this project. If we simply hand off campaign finance data projects to LLMs, who’s going to actually understand campaign finance data to make any use of it? Maybe the AGI will do it someday, but I don’t think we’re quite there yet.