Post-Mortem: Antivirus Integration on a 1 GB Nextcloud VPS (failed due load)

Failing Experiments Are Useful

Today I tried to push my togo-lab.io setup (see 2nd block at my landing page for all services) a bit further than strictly necessary. So maybe you noticed some outages.

Why I did this? Not only for production security reasons, but also out of curiosity: How far can I go with limited resources? I want to understand where the real limits for my setup are.

In this experiment, I wanted to see whether a single small VPS could reasonably host Nextcloud, a Matrix server, and Gitea, and still handle antivirus scanning for file uploads. At first glance it looked tight, but maybe possible. In practice, it pushed this small VPS over the edge.

That outcome was useful for my learning and understanding. When things fail or become unstable, I usually learn much more than when everything works smoothly. As a technician, breaking things on purpose for learning is, for me, typically the fastest way to understand them.

The following section is a post-mortem of that experiment: what it revealed, what I learned from it, and what would be required if I want to add antivirus scanning in the future.


Context:
This server hosts multiple self-managed services on a small VPS (1 GB RAM, 1 vCPU):

  • Nextcloud (primary collaboration platform)
  • Matrix server
  • Gitea
  • Supporting stack (PHP-FPM, MariaDB, Redis, Apache/Nginx, Fail2Ban)

The goal was to add antivirus scanning for uploaded files in Nextcloud, as preparation for future collaborative use.


Initial Goal

Enable server-side antivirus scanning for Nextcloud uploads using ClamAV, with the following constraints:

  • Lightweight
  • Automated
  • No interactive maintenance
  • Suitable for a self-hosted environment

This is a reasonable baseline requirement once multiple external contributors are involved.


Attempted Approaches

1. ClamAV Daemon (clamd) + Nextcloud (Socket Mode)

What was tried

  • Installed ClamAV daemon

  • Tuned clamd.conf aggressively (single thread, reduced parsers, size limits)

  • Added strict systemd memory limits

  • Disabled background scans

  • Socket-based integration with Nextcloud

  • **changes to the default /etc/clamav/clamd.conf

          # === VPS-safe limits ===
          MaxThreads 1
          ConcurrentDatabaseReload no
    
          # File size limits (Nextcloud uploads)
          MaxFileSize 50M
          MaxScanSize 75M
          StreamMaxLength 75M
    
          # Archive / recursion limits
          MaxRecursion 10
          MaxFiles 5000
    
          # Timeouts
          ReadTimeout 120
          CommandReadTimeout 120
    
          # Disable low-value / memory-heavy scanners
          ScanHTML false
          ScanMail false
          ScanSWF false
          ScanHWP3 false
          ScanXMLDOCS false
    
          # Reduce bytecode impact
          Bytecode true
          BytecodeTimeout 20000
    
          # Reduce RAM further (Nextcloud upload use-case)
          PhishingSignatures false
          PhishingScanURLs false
          DisableCache true
          ExtendedDetectionInfo false
  • best I got, via free -h

total used free shared buff/cache available
Mem: 960Mi 926Mi 68Mi 31Mi 101Mi 33Mi
Swap: 1.0Gi 843Mi 180Mi
    $ swapon --show
    NAME      TYPE  SIZE   USED PRIO
    /swapfile file 1024M 848.5M   -2`

Observed behavior

  • clamd resident memory usage: ~500–600 MB
  • Heavy swap usage even after tuning
  • Periodic stalls, SSH lag, partial service unresponsiveness
  • OOM kills during database reload or startup

Conclusion Even heavily tuned, resident ClamAV is not viable on a 1 GB VPS that already runs multiple services.


2. ClamAV Executable Mode (clamscan on upload)

What was tried

  • Disabled clamd entirely
  • Used Nextcloud Antivirus for Files app in Executable mode
  • Scanning only on upload
  • Strict size limits
  • No background scans

FYI Final authoritative configuration in NextCloud App “Antivirus for Files”

  • Mode: ClamAV Executable
  • Path to clamscan: /usr/bin/clamscan
  • Extra command line options (comma-separated): --no-summary,--infected,--max-filesize=50M,--max-scansize=75M
  • Stream Length: 104857600
  • Block uploads when scanner is not reachable: Yes
  • Block unscannable files: No
  • Background scans: effectively off (unchecked)

Observed behavior

  • Technically functional
  • No permanent memory footprint
  • However:
    • Uploads caused noticeable CPU + IO spikes
    • PHP-FPM workers stalled under load
    • Combined service activity still led to instability

Conclusion Even non-resident scanning adds too much peak load for this VPS when combined with:

  • Nextcloud
  • Matrix
  • Gitea
  • Database and cache services

Final Decision

Antivirus disabled (for now)

The Nextcloud Antivirus app is currently disabled.

Reasons:

  • System stability has higher priority than partial security measures
  • Trusted users only
  • Strict file permissions
  • Regular backups
  • No public upload endpoints

After disabling AV and rebooting:

  • System became stable
  • Swap usage normalized
  • All services responsive:
    • Nextcloud
    • Matrix
    • Gitea

Post-Mortem Summary

Item Result
Configuration error ❌ No
ClamAV bug ❌ No
Nextcloud bug ❌ No
VPS resource limit ✅ Yes
Wrong architecture ✅ Yes (for this size)

Meaning:

  • This was not a misconfiguration.
  • It was a capacity mismatch.

Lessons Learned

  1. 1 GB VPS is already at the limit for:

    • Nextcloud
    • Matrix
    • Gitea
      combined.
  2. Antivirus scanning is loadwise not “free”, even in executable mode.

  3. Security features that trigger CPU + IO spikes must be sized for worst-case concurrency, not idle averages.

  4. Adding AV without increasing resources creates negative security by destabilizing the system.


When Antivirus Will Be Re-Enabled

Antivirus scanning will be mandatory once this instance is used for real group collaboration.

That will require one of the following options:

Option A — VPS Upgrade (actual preferred)
  • Upgrade to ≥ 2 GB RAM
  • Re-enable ClamAV (daemon or executable mode)
  • Keep all services on one host
Option B — Service Split
  • VPS 1: Nextcloud
  • VPS 2: Matrix + Gitea
  • Antivirus enabled only on Nextcloud host

Current Security Posture (Interim)

  • If, than only trusted users
  • No public upload endpoints
  • Strict permissions
  • Fail2Ban + firewall
  • Frequent backups
  • Fast restore tested

This is acceptable temporarily, but not a final state.


Closing Notes

This experiment was intentional and valuable, learned a lot, also during configuration and tuning.

It clarified:

  • the real resource cost of antivirus scanning
  • the practical limits of small VPS setups
  • and the architectural decisions required for future growth

When collaboration expands, the infrastructure will be expanded accordingly. So clamav and configuration stays, but Nextcloud App is disabled for now, but not uninstalled.