A continuation of my previous post on Filtering Drupal comments for spam

Overview#

  • Old Drupal 5 blog with my old posts stuck in the Drupal node table
  • ~19k comments on my pages before I fixed spam filtering
  • Keeping it local with Ollama
  • Google Gemma3 was a good speed/quality for detecting spam
  • Deepseek R1 takes way too long because it reasons for every comment

TLDR:#

  • I excluded a lot of spam comments with Gemma3 and boiled them down to a more manageable set
  • Automating this process means I could easily miss comments due to false-positives
  • Need to log the Drupal node ID so that I can link filtered comments back to my markdown blog posts