Home / Server / Server Maintenance / Monthly Server Maintenance Checklist

Monthly Server Maintenance Checklist

A monthly server maintenance checklist keeps your infrastructure healthy, catches problems before they become outages, and gives you a documented record that everything has been checked. Here is a practical checklist covering the tasks that matter most, with guidance on what to look for in each.

How to Use This Checklist

Run through this checklist once a month — or more frequently for critical servers. Keep a log (even a simple spreadsheet) recording the date, who performed the check, and any findings. A documented maintenance history is valuable when diagnosing recurring problems or planning hardware replacements.

Hardware and Physical

  • Visual inspection: check front panel LEDs — any amber lights on the system health or drive bay indicators need immediate investigation. See the full server visual inspection guide for detail.
  • Fan noise: listen for any grinding, rattling, or changes in pitch from server fans
  • Cable check: confirm all network and power cables are fully seated
  • Temperature check: review thermal sensor data in iDRAC/iLO — inlet temperature, CPU temperatures, any thermal events in the hardware event log
  • UPS status: confirm UPS is on mains power, battery is healthy, and estimated runtime is acceptable. Test under load annually.
  • Server room temperature: confirm ambient temperature is within the recommended 18–27°C range

Storage

  • Disk space: check free space on all volumes — alert threshold at 20% free, critical at 10%. Run Get-PSDrive -PSProvider FileSystem in PowerShell.
  • Drive health: run Get-PhysicalDisk | Select-Object FriendlyName, HealthStatus — all drives should be Healthy
  • RAID status: check via iDRAC/iLO or RAID management software — array should be Optimal. Any Degraded status means a drive has failed and needs replacing before the next check.
  • Event Viewer disk errors: filter the System log for Event ID 7 and 11 — repeated disk I/O errors indicate a failing drive

Performance

  • CPU usage baseline: review average CPU usage over the past week from Performance Monitor or your monitoring tool. Sustained above 80% indicates the server is undersized or has a runaway process.
  • Memory usage: check available memory — consistent low availability may require a RAM upgrade or application tuning
  • Page file activity: check Memory → Pages/sec in Performance Monitor. Regular high paging indicates memory pressure.
  • Slow queries or application issues: review application logs for repeated errors or performance timeouts that did not generate a helpdesk ticket

Security

  • Windows Updates: verify the server is up to date with security patches — run Get-HotFix | Sort-Object InstalledOn -Descending | Select-Object -First 5 to check the most recent patches
  • Security event log — failed logins: review the Security log for clusters of Event ID 4625 (failed login). Repeated failures for the Administrator account indicate a brute-force attempt.
  • Local administrator accounts: confirm no unexpected local admin accounts have been added — run Get-LocalGroupMember -Group Administrators
  • Antivirus definitions: confirm AV definitions are up to date and no threats have been detected
  • SSL certificates: check that no certificates expire within 60 days — run Get-ChildItem Cert:\LocalMachine\My | Where-Object {$_.NotAfter -lt (Get-Date).AddDays(60)}

Backup and Recovery

  • Backup job status: confirm the last backup completed successfully — check backup software logs or Event Viewer backup events
  • Backup age: ensure backups are current — a “last successful backup” older than 48 hours for a production server is a risk
  • Offsite / cloud copy: confirm that an offsite copy of backup data exists (local-only backup does not protect against fire, theft, or ransomware that encrypts the backup destination)
  • Restore test: quarterly (not just monthly) — test restoring a single file from backup to confirm the backup is actually usable

Services and Applications

  • Services check: confirm all critical services are in the expected state — Get-Service | Where-Object {$_.StartType -eq 'Automatic' -and $_.Status -ne 'Running'}
  • Event Viewer review: check the Administrative Events custom view (all Critical and Error events) for any recurring issues that need attention
  • Uptime: review server uptime — any unplanned reboots in the past month? Check Event ID 6008 in the System log.
  • Application logs: review logs for the key application the server hosts (SQL Server error log, IIS logs, Exchange health report)

Documentation

  • Update records: note any changes made this month — new roles installed, configuration changes, hardware swaps
  • Hardware age: flag any components approaching expected end of life — most enterprise hard drives and SSDs have a 3–5 year warranty; servers are typically replaced on a 5–7 year cycle
  • Warranty and support status: check that hardware warranty and OS support contracts are current and not approaching expiry

Sign Up For Daily Newsletter

Stay updated with our weekly newsletter. Subscribe now to never miss an update!

[mc4wp_form]

Leave a Reply

Your email address will not be published. Required fields are marked *