Home / Server / Server Maintenance / Monthly Server Maintenance Checklist

Server Maintenance

Monthly Server Maintenance Checklist

A monthly server maintenance checklist keeps your infrastructure healthy, catches problems before they become outages, and gives you a documented record that everything has been checked. Here is a practical checklist covering the tasks that matter most, with guidance on what to look for in each.

How to Use This Checklist

Run through this checklist once a month — or more frequently for critical servers. Keep a log (even a simple spreadsheet) recording the date, who performed the check, and any findings. A documented maintenance history is valuable when diagnosing recurring problems or planning hardware replacements.

Hardware and Physical

Visual inspection: check front panel LEDs — any amber lights on the system health or drive bay indicators need immediate investigation. See the full server visual inspection guide for detail.
Fan noise: listen for any grinding, rattling, or changes in pitch from server fans
Cable check: confirm all network and power cables are fully seated
Temperature check: review thermal sensor data in iDRAC/iLO — inlet temperature, CPU temperatures, any thermal events in the hardware event log
UPS status: confirm UPS is on mains power, battery is healthy, and estimated runtime is acceptable. Test under load annually.
Server room temperature: confirm ambient temperature is within the recommended 18–27°C range

Storage

Disk space: check free space on all volumes — alert threshold at 20% free, critical at 10%. Run Get-PSDrive -PSProvider FileSystem in PowerShell.
Drive health: run Get-PhysicalDisk | Select-Object FriendlyName, HealthStatus — all drives should be Healthy
RAID status: check via iDRAC/iLO or RAID management software — array should be Optimal. Any Degraded status means a drive has failed and needs replacing before the next check.
Event Viewer disk errors: filter the System log for Event ID 7 and 11 — repeated disk I/O errors indicate a failing drive

Performance

CPU usage baseline: review average CPU usage over the past week from Performance Monitor or your monitoring tool. Sustained above 80% indicates the server is undersized or has a runaway process.
Memory usage: check available memory — consistent low availability may require a RAM upgrade or application tuning
Page file activity: check Memory → Pages/sec in Performance Monitor. Regular high paging indicates memory pressure.
Slow queries or application issues: review application logs for repeated errors or performance timeouts that did not generate a helpdesk ticket

Security

Windows Updates: verify the server is up to date with security patches — run Get-HotFix | Sort-Object InstalledOn -Descending | Select-Object -First 5 to check the most recent patches
Security event log — failed logins: review the Security log for clusters of Event ID 4625 (failed login). Repeated failures for the Administrator account indicate a brute-force attempt.
Local administrator accounts: confirm no unexpected local admin accounts have been added — run Get-LocalGroupMember -Group Administrators
Antivirus definitions: confirm AV definitions are up to date and no threats have been detected
SSL certificates: check that no certificates expire within 60 days — run Get-ChildItem Cert:\LocalMachine\My | Where-Object {$_.NotAfter -lt (Get-Date).AddDays(60)}

Backup and Recovery

Backup job status: confirm the last backup completed successfully — check backup software logs or Event Viewer backup events
Backup age: ensure backups are current — a “last successful backup” older than 48 hours for a production server is a risk
Offsite / cloud copy: confirm that an offsite copy of backup data exists (local-only backup does not protect against fire, theft, or ransomware that encrypts the backup destination)
Restore test: quarterly (not just monthly) — test restoring a single file from backup to confirm the backup is actually usable

Services and Applications

Services check: confirm all critical services are in the expected state — Get-Service | Where-Object {$_.StartType -eq 'Automatic' -and $_.Status -ne 'Running'}
Event Viewer review: check the Administrative Events custom view (all Critical and Error events) for any recurring issues that need attention
Uptime: review server uptime — any unplanned reboots in the past month? Check Event ID 6008 in the System log.
Application logs: review logs for the key application the server hosts (SQL Server error log, IIS logs, Exchange health report)

Documentation

Update records: note any changes made this month — new roles installed, configuration changes, hardware swaps
Hardware age: flag any components approaching expected end of life — most enterprise hard drives and SSDs have a 3–5 year warranty; servers are typically replaced on a 5–7 year cycle
Warranty and support status: check that hardware warranty and OS support contracts are current and not approaching expiry

Monthly Server Maintenance Checklist

Table of Contents

1. How to Use This Checklist

2. Hardware and Physical

3. Storage

4. Performance

5. Security

6. Backup and Recovery

7. Services and Applications

8. Documentation

9. Related Guides

How to Use This Checklist

Hardware and Physical

Storage

Performance

Security

Backup and Recovery

Services and Applications

Documentation

How to Check FSMO Roles in Active Directory

How to Decommission a Windows Server Safely

Monthly Server Maintenance Checklist

Table of Contents

How to Use This Checklist

Hardware and Physical

Storage

Performance

Security

Backup and Recovery

Services and Applications

Documentation

Related Guides

How to Check FSMO Roles in Active Directory

How to Decommission a Windows Server Safely

Related Posts