Critique of the TechCrunch Article on Google's Call-Scanning AI

2024-05-16

Yesterday, I predicted that the unwarranted outcry by certain privacy experts towards Google’s new local LLM that scans calls for scams would result in misleading press articles. Today, we have one such press article over at TechCrunch. The article is written by veteran journalist Natasha Lomas, but unfortunately is comprehensive only in its treatment of a one-sided perspective, and fails to take into account the same fundamental distinctions that these privacy experts have also skimmed over.

The article raises concerns about Google’s new AI feature for scanning voice calls, which Google announced here two days ago:

Crucially, Google’s feature is based on an entirely local LLM (large language model) which, according to Google, is simply not designed to send any data about your conversation back to Google, to law enforcement, or any other third party.

The primary arguments made in the TechCrunch article however revolve around the potential for surveillance, censorship and the broader implications of client-side scanning.

The TechCrunch article quotes well-known true experts in the field, who in my view are being unusually wrong and one-sided about this particular issue. Here is how:

Expert 1: Matthew Green

Matthew Green argues that while the technology may start with benign intentions, it could expand to more invasive forms of surveillance.

Green’s argument is based on a slippery slope fallacy, assuming that the initial use of AI for call scanning will inevitably lead to broader and more intrusive applications. Vitally, this perspective overlooks the clear distinction between on-device scanning and remote data collection. Historically, privacy concerns have centered on data being sent back to central servers without user consent. As long as the AI operates locally, the primary privacy risk is mitigated. The function creep argument requires more concrete evidence of policy changes rather than speculative scenarios.

Matthew suggests a dystopian future where users might need to attach zero-knowledge proofs to their data to pass through service providers, effectively blocking open clients. This argument is completely speculative. The idea that AI could enforce such stringent measures and require zero-knowledge proofs for everyday communication is simply not supported by Google’s local LLM, and there is no indication that such a system is being built on top of it.

In fact, pairing zero-knowledge proofs to the output of AI algorithms in order to mandate access to civil privileges is an extremely novel and untested idea. Local AI processing, as described by Google, does not imply automatic reporting or zero-knowledge proof requirements. His concerns appear to conflate local AI inference with broader, more invasive monitoring, which isn’t supported by Google’s current implementation.

Some historical context: Apple’s CSAM scanning controversy

Matthew’s analysis, as well as the article at large, draws parallels with Apple’s 2021 CSAM (child sexual abuse materials) scanning controversy, where the primary issue was the fact that that scanning algorithm was specifically built, from day one, to report back to Apple servers and law enforcement in case objectionable materials were found locally on the user’s device. Privacy advocates (including myself, who lobbied extensively against Apple’s CSAM proposal) were concerned about the lack of transparency and potential for misuse.

Similarly, I am also strongly against current legislative proposals for scanning messaging apps for CSAM. However, there is simply no clear relationship between Google’s proposed local scanning model and that legislative push.

The key difference with Google’s AI is the on-device nature of the scanning, which does not involve transmitting data off the device. This distinction addresses the primary concern that led to the backlash against Apple’s proposal. By keeping the scanning local, Google avoids the privacy risks associated with remote reporting. The article fails to adequately differentiate between these two approaches, leading to a conflation of issues. This is the central point of my critique.

Experts 2 and 3: Lukasz Olejnik and Michael Veale

Lukasz Olejnik welcomes the anti-scam feature but cautions about the potential for repurposing this technology for social surveillance. He highlights that while Google’s AI runs on-device, the existence of such capabilities could lead to future misuse. Michael Veale similarly points out the risk of function creep, emphasizing the potential for regulatory and legislative abuse. This concern is rooted in ongoing legislative proposals in the EU that could mandate client-side scanning to detect illegal activities.

Lukasz and Michael’s concerns about function creep are somewhat reasonable. However, again, these statement lacks concrete evidence that Google or any other entity plans to repurpose this AI for monitoring a wide range of behaviors. The critique would be stronger with specific examples of how similar technologies have been misused and, especially, with evidence on how Google’s current implementation is evolving into such a scenario especially when all scanning is done locally and never leaves your device.

Again: with Apple’s 2021 CSAM proposal, the local scanning model was designed “out of the gate” to secretly send reports to Apple and to law enforcement should illicit content be detected on users’ private devices. That is what made it dangerous; not the fact that it did local scanning. This is, however, simply not the case with Google’s LLM. Again, that is the fundamental difference here.

We’ve had local content scanning for decades

The main difference between Google’s AI and traditional software lies in the sophistication of the algorithm, not the fundamental function of local content scanning. We’ve had local content scanning for decades. For instance, iCloud Photos uses background content scanners to sort photos by faces, locations, topics, and pets. Apple Mail’s spam filter has been reading emails and determining if they’re scams for decades. This context is crucial because it shows that the technology itself isn’t new; the concern has always been about whether this data is sent back to servers or whether these algorithms secretly “report” on their users when suspect content is detected.

Historical examples:

Google’s proposed scam call scanning model, if truly completely local, would simply be equivalent to a slightly more advanced algorithm similar to those cited above.

The critical point is that these technologies have operated discretely from cloud-based data sharing. The privacy risk primarily arises when local content scanning results are transmitted to external servers without user consent, potentially exposing personal information.

The current panic over AI-enhanced local content scanning, such as Google’s call-scanning AI, seems misplaced if it operates entirely on-device. The sophistication of AI doesn’t inherently increase privacy risks unless the data is sent to external servers. Therefore, equating AI with a privacy invasion is misleading when the fundamental function remains local and isolated from cloud-based reporting: the term “AI” seems to be simply used as a bogeyman here, when in reality it’s just a slightly more sophisticated algorithm. The question of whether that algorithm phones back home remains totally distinct.

In conclusion, the experts’ concerns about Google’s AI reflect a misunderstanding of the technology’s operational scope. By ensuring that AI processes data locally without phoning home, the fundamental privacy safeguards remain intact, just as they have with previous local content scanning technologies. Unlike what we recently faced with Apple’s CSAM proposal, where the harm was baked in through the official technical specification from day one, the real issue here lies in completely speculative potential future misuse, not in the current implementation of AI-enhanced local scanning.

Google often makes privacy-invasive technology, and the above-cited privacy experts are seldom wrong. But this really does seem like one rare instance where Google isn’t necessarily doing anything wrong, and where privacy experts seem to have collectively jumped the gun. I urge for a moment of critical thinking and reflection.