Home / Server / Server Maintenance / Monthly Server Maintenance Checklist

Monthly Server Maintenance Checklist

A monthly server maintenance checklist keeps your infrastructure healthy, catches problems before they become outages, and gives you a documented record that everything has been checked. Here is a practical checklist covering the tasks that matter most, with guidance on what to look for in each.

How to Use This Checklist

Run through this checklist once a month — or more frequently for critical servers. Keep a log (even a simple spreadsheet) recording the date, who performed the check, and any findings. A documented maintenance history is valuable when diagnosing recurring problems or planning hardware replacements.

Hardware and Physical

  • Visual inspection: check front panel LEDs — any amber lights on the system health or drive bay indicators need immediate investigation. See the full server visual inspection guide for detail.
  • Fan noise: listen for any grinding, rattling, or changes in pitch from server fans
  • Cable check: confirm all network and power cables are fully seated
  • Temperature check: review thermal sensor data in iDRAC/iLO — inlet temperature, CPU temperatures, any thermal events in the hardware event log
  • UPS status: confirm UPS is on mains power, battery is healthy, and estimated runtime is acceptable. Test under load annually.
  • Server room temperature: confirm ambient temperature is within the recommended 18–27°C range

Storage

  • Disk space: check free space on all volumes — alert threshold at 20% free, critical at 10%. Run Get-PSDrive -PSProvider FileSystem in PowerShell.
  • Drive health: run Get-PhysicalDisk | Select-Object FriendlyName, HealthStatus — all drives should be Healthy
  • RAID status: check via iDRAC/iLO or RAID management software — array should be Optimal. Any Degraded status means a drive has failed and needs replacing before the next check.
  • Event Viewer disk errors: filter the System log for Event ID 7 and 11 — repeated disk I/O errors indicate a failing drive

Performance

  • CPU usage baseline: review average CPU usage over the past week from Performance Monitor or your monitoring tool. Sustained above 80% indicates the server is undersized or has a runaway process.
  • Memory usage: check available memory — consistent low availability may require a RAM upgrade or application tuning
  • Page file activity: check Memory → Pages/sec in Performance Monitor. Regular high paging indicates memory pressure.
  • Slow queries or application issues: review application logs for repeated errors or performance timeouts that did not generate a helpdesk ticket

Security

  • Windows Updates: verify the server is up to date with security patches — run Get-HotFix | Sort-Object InstalledOn -Descending | Select-Object -First 5 to check the most recent patches
  • Security event log — failed logins: review the Security log for clusters of Event ID 4625 (failed login). Repeated failures for the Administrator account indicate a brute-force attempt.
  • Local administrator accounts: confirm no unexpected local admin accounts have been added — run Get-LocalGroupMember -Group Administrators
  • Antivirus definitions: confirm AV definitions are up to date and no threats have been detected
  • SSL certificates: check that no certificates expire within 60 days — run Get-ChildItem Cert:\LocalMachine\My | Where-Object {$_.NotAfter -lt (Get-Date).AddDays(60)}

Backup and Recovery

  • Backup job status: confirm the last backup completed successfully — check backup software logs or Event Viewer backup events
  • Backup age: ensure backups are current — a “last successful backup” older than 48 hours for a production server is a risk
  • Offsite / cloud copy: confirm that an offsite copy of backup data exists (local-only backup does not protect against fire, theft, or ransomware that encrypts the backup destination)
  • Restore test: quarterly (not just monthly) — test restoring a single file from backup to confirm the backup is actually usable

Services and Applications

  • Services check: confirm all critical services are in the expected state — Get-Service | Where-Object {$_.StartType -eq 'Automatic' -and $_.Status -ne 'Running'}
  • Event Viewer review: check the Administrative Events custom view (all Critical and Error events) for any recurring issues that need attention
  • Uptime: review server uptime — any unplanned reboots in the past month? Check Event ID 6008 in the System log.
  • Application logs: review logs for the key application the server hosts (SQL Server error log, IIS logs, Exchange health report)

Documentation

  • Update records: note any changes made this month — new roles installed, configuration changes, hardware swaps
  • Hardware age: flag any components approaching expected end of life — most enterprise hard drives and SSDs have a 3–5 year warranty; servers are typically replaced on a 5–7 year cycle
  • Warranty and support status: check that hardware warranty and OS support contracts are current and not approaching expiry