Linux Server Administration

Home Technologies Linux Server Administration

Overview

Most production software runs on Linux. Web servers, application backends, database servers, message brokers, monitoring infrastructure — the vast majority of the server-side software that powers production systems runs on Linux distributions, typically Ubuntu, Debian, or CentOS/RHEL derivatives. The reliability, security, and operational performance of a production application is determined not just by the quality of the application code but by the quality of the Linux infrastructure it runs on — the server configuration, the security hardening, the network setup, the process management, the backup regime, and the monitoring that surfaces problems before they become outages.

Linux server administration is the operational discipline that keeps production infrastructure running correctly, securely, and reliably. It encompasses the initial server configuration that establishes a secure baseline, the web server and reverse proxy setup that routes traffic correctly, the process management that keeps application services running and restarts them when they fail, the firewall configuration that limits the attack surface, the certificate management that maintains HTTPS, the monitoring that surfaces performance degradation and failures, the backup processes that protect data, and the ongoing maintenance that keeps the system secure and up to date.

The gap between a server that has been properly administered and one that has not becomes apparent over time — in the security incidents that result from unpatched vulnerabilities, in the application downtime that results from processes that are not managed correctly, in the data loss that results from backup processes that were never tested, and in the performance degradation that results from disk space that filled silently without alerting.

We manage Linux server infrastructure for the applications we build and deploy, and provide server administration services for existing infrastructure that needs to be configured correctly, secured, or maintained.


What Linux Server Administration Covers

Initial server setup and hardening. The baseline configuration that every production server should have before any application software is installed.

User account management: creating application-specific user accounts with minimal permissions rather than running application services as root. Disabling root SSH login. SSH public key authentication configured and password authentication disabled — eliminating the brute-force attack surface that password-based SSH access creates. Sudo configuration that grants specific elevated permissions to specific users without granting unrestricted root access.

SSH hardening: configuring the SSH daemon with the security settings that reduce the attack surface — disabling protocol version 1, disabling empty passwords, configuring the allowed authentication methods, setting appropriate timeouts, and limiting the users and groups that can authenticate via SSH.

Firewall configuration: ufw or iptables / nftables rules that allow only the traffic the server needs to receive and deny everything else. Default-deny ingress policy with explicit allow rules for the specific ports and protocols the server's services use — typically SSH, HTTP, and HTTPS for a web application server, with additional ports for specific service requirements. Rate limiting for SSH connection attempts to reduce brute-force attack effectiveness.

Automatic security updates: unattended-upgrades configuration that automatically installs security patches for installed packages without requiring manual intervention. The automatic update policy that ensures known vulnerabilities are patched promptly without requiring scheduled maintenance windows for every security update.

System limits and kernel parameters: sysctl configuration for the kernel parameters that affect network performance and security — TCP connection settings, file descriptor limits, network buffer sizes. limits.conf configuration for the process-level resource limits that prevent runaway processes from consuming excessive resources.

Nginx configuration and reverse proxying. Nginx is our primary web server and reverse proxy — the component that handles TLS termination, routes HTTP requests to application backends, serves static files, and enforces the HTTP-level security headers and redirects that production web applications require.

Virtual host configuration: Nginx server blocks for each domain and application the server hosts, with the correct root directory, index file, and error page configuration. HTTP to HTTPS redirect configuration that ensures all traffic is served over encrypted connections. Server name configuration for correct virtual host selection.

Reverse proxy configuration: proxy_pass directives that forward requests from Nginx to the application backend running on a local port. Proxy header configuration — X-Real-IP, X-Forwarded-For, X-Forwarded-Proto — that passes client connection information to the application backend. Proxy timeout configuration for applications with long-running request handling. WebSocket proxy configuration for applications that use WebSocket connections.

TLS configuration: Let's Encrypt certificate provisioning with Certbot and the certificate auto-renewal that keeps HTTPS working without manual intervention. TLS 1.2 and 1.3 protocol configuration with modern cipher suites. HSTS headers that instruct browsers to always use HTTPS for the domain. OCSP stapling for improved certificate validation performance.

Security headers: X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy, and Content-Security-Policy headers configured appropriately for each application. The security header configuration that prevents common web vulnerabilities — clickjacking, MIME type sniffing, information leakage.

Static file serving and caching: Nginx configuration for serving static assets directly without forwarding to the application backend, with appropriate cache control headers that allow browsers and CDN layers to cache static content. Gzip and Brotli compression configuration that reduces response sizes for text-based content.

Rate limiting and access control: Nginx rate limiting directives that limit request rates from individual IP addresses — protecting against simple denial-of-service attacks and credential stuffing. IP allowlist configuration for administrative interfaces that should not be publicly accessible.

Process management with systemd. systemd service units that manage application process lifecycle — starting services when the server boots, restarting them when they crash, and providing the service management interface for starting, stopping, and checking the status of application services.

Service unit file creation: [Service] section configuration with the correct ExecStart command, the user account the service runs as, the working directory, environment variable configuration, and the restart policy. [Unit] section with the correct dependency declarations — services that depend on the network being available, on the database service being running, or on other prerequisites. [Install] section for enabling automatic start at boot.

Restart policies: Restart=always or Restart=on-failure configuration with appropriate RestartSec delay — the configuration that ensures the service is automatically restarted after crashes without creating restart loops that mask persistent failures. StartLimitIntervalSec and StartLimitBurst configuration that limits restart attempts and triggers a failure state when a service crashes repeatedly.

Logging integration: systemd journal integration for service logs — the logs from the application process captured by journald and accessible via journalctl. Log forwarding configuration that ships systemd journal logs to centralised log aggregation when the server is part of a larger monitored infrastructure.

Socket activation: systemd socket units for services that should start on demand when a connection arrives — the socket activation model that eliminates the need for the service to be running before any connections are attempted.

Disk and storage management. The storage configuration that keeps production servers from running out of disk space unexpectedly.

Partition and filesystem setup: the disk partitioning and filesystem layout for new servers — separating the operating system, application data, and log storage onto different partitions or volumes where appropriate. Filesystem choice and mount option configuration. /tmp and /var/log management for systems where log growth or temporary file accumulation can fill the root filesystem.

Disk space monitoring: alerting when disk utilisation exceeds configured thresholds — the alert that fires at 80% utilisation rather than at 100% when the filesystem is full and the application has stopped working. Log rotation configuration with logrotate that prevents application and system logs from filling the disk, with appropriate retention periods and compression.

LVM and volume management: Logical Volume Manager configuration for flexible storage management — the ability to resize logical volumes online when additional capacity is needed, to take volume snapshots for consistent backups, and to manage the physical storage pool across multiple disks.

Database server administration. PostgreSQL and MySQL server administration for databases that run alongside application services.

PostgreSQL configuration: postgresql.conf tuning for the server's memory and workload characteristics — shared_buffers, effective_cache_size, work_mem, max_connections, and the other parameters that determine how PostgreSQL uses the available resources. pg_hba.conf authentication configuration for local and remote connection authentication. Backup configuration with pg_dump or pg_basebackup and the backup rotation that maintains a useful recovery history.

MySQL configuration: my.cnf tuning for InnoDB buffer pool size, connection limits, and the query cache configuration. Binary log configuration for point-in-time recovery capability. User account management with the minimum privileges required for each application's database access.

Slow query logging: enabling and configuring the slow query log that captures queries exceeding the configured execution time threshold — the operational data that identifies the queries that need optimisation before they become performance problems.

Monitoring and alerting. The visibility infrastructure that surfaces server and application health issues before they become user-facing problems.

System metrics monitoring: CPU utilisation, memory usage, disk I/O, network I/O, and disk space — the fundamental system health metrics. Node Exporter with Prometheus for structured metrics collection. Simple monitoring with periodic checks and alerting for environments that do not require the full Prometheus/Grafana stack.

Application health monitoring: uptime monitoring for the HTTP endpoints that the application exposes — the HTTP check that verifies the application is responding correctly, with alerting when the application returns an error or stops responding. Monitoring of application-specific metrics — request rates, error rates, response times — where the application exposes these metrics.

Log monitoring and alerting: monitoring application and system logs for error patterns that indicate problems — the log-based alert that fires when an application starts logging errors at a rate that exceeds the configured threshold, or when a specific error pattern that indicates a critical failure appears.

Alert routing: configuring the alerting infrastructure to route alerts to the appropriate channels — email, SMS, Slack, PagerDuty — with appropriate urgency levels and routing rules that ensure critical alerts reach the people who need to respond to them.

Backup and recovery. The backup processes that protect application data and make disaster recovery possible.

Application data backup: scheduled backup jobs that export application databases, capture uploaded file content, and export any other application state that needs to be preserved. Backup storage to off-server locations — S3, B2, or another remote storage destination — ensuring that server failures do not take the backups with them.

Backup testing: periodic restore tests that verify the backups are valid and restorable — the test that distinguishes a backup process that is working from a backup process that is running but producing corrupted or incomplete backups. Documented restore procedures that have been tested rather than improvised at the moment of data loss.

Backup retention: the retention policy that keeps enough historical backups to recover from data corruption or accidental deletion that is not immediately noticed — typically daily backups for a week, weekly backups for a month, and monthly backups for a year, adjusted based on the recovery requirements of the specific application.

Performance tuning. Configuration adjustments that improve server and application performance for the specific workload profile.

Memory and swap configuration: appropriate swap configuration for the server's memory profile — the swap space that provides buffer against memory spikes without the performance impact of excessive swap usage. Swappiness configuration that controls how aggressively the kernel uses swap.

Network performance: TCP settings for the connection volume and connection pattern the server handles — net.core.somaxconn for the connection backlog, net.ipv4.tcp_fin_timeout for TIME_WAIT connection management, net.ipv4.ip_local_port_range for high connection volume servers.

Application-specific tuning: Nginx worker process and connection configuration for the server's CPU count and expected connection volume, PHP-FPM pool configuration for PHP applications, Node.js cluster mode configuration for multi-core utilisation.


Our Own Infrastructure

Our production services — including this website — run on Linux VPS infrastructure with Nginx as the reverse proxy, systemd for process management, and Plesk for hosting management where applicable. The server administration practices described here are the same practices we apply to our own infrastructure, not just recommendations we make for others.


Technologies Used

  • Ubuntu / Debian — primary Linux distributions for production server deployments
  • Nginx — web server and reverse proxy
  • systemd — process and service management
  • ufw / iptables / nftables — firewall configuration
  • Certbot / Let's Encrypt — TLS certificate provisioning and renewal
  • PostgreSQL / MySQL — database server administration
  • logrotate — log rotation and retention management
  • unattended-upgrades — automatic security update management
  • Prometheus / Node Exporter — metrics collection
  • Grafana — metrics visualisation and dashboards
  • rsync / pg_dump / pg_basebackup — backup tooling
  • AWS S3 / Backblaze B2 — off-server backup storage
  • Plesk — hosting management panel where applicable
  • Docker / Docker Compose — containerised application deployment
  • fail2ban — intrusion prevention for SSH and application services

Infrastructure That Does Not Draw Attention to Itself

The measure of well-administered Linux infrastructure is that it operates reliably without requiring frequent attention — applications stay running, certificates renew automatically, backups complete without errors, security patches are applied, and the monitoring that would surface problems if they occurred confirms that everything is operating normally. Production infrastructure administered to this standard is infrastructure that supports the business applications running on it rather than becoming a source of operational problems in its own right.


The Foundation That Applications Run On

Applications can only be as reliable as the infrastructure they run on. Linux server administration that is done correctly — secure baseline configuration, correct process management, working backups, and the monitoring that surfaces problems early — provides the foundation that production applications require to operate reliably.