How to Use Filebeat

Introduction Filebeat is a lightweight, open-source log shipper developed by Elastic as part of the Elastic Stack (ELK). Designed to efficiently collect, forward, and centralize log data from files on your systems, Filebeat plays a critical role in modern observability architectures. Whether you're managing a small application server or a distributed microservices environment, Filebeat ensures you

Oct 25, 2025 - 12:46
Oct 25, 2025 - 12:46
 0

Introduction

Filebeat is a lightweight, open-source log shipper developed by Elastic as part of the Elastic Stack (ELK). Designed to efficiently collect, forward, and centralize log data from files on your systems, Filebeat plays a critical role in modern observability architectures. Whether you're managing a small application server or a distributed microservices environment, Filebeat ensures your logs reach your chosen destination whether Elasticsearch, Logstash, or Kafka reliably and with minimal resource overhead.

However, simply installing Filebeat is not enough. Many users encounter issues such as log duplication, missed entries, configuration errors, or security vulnerabilities that compromise data integrity. Trust in Filebeat doesnt come from its reputation alone it comes from knowing how to use it correctly. This article presents the top 10 proven, battle-tested methods to use Filebeat that you can trust. These are not theoretical suggestions; they are practices validated by DevOps teams managing petabytes of log data across production environments worldwide.

Each method is grounded in real-world use cases, official documentation, and community best practices. By following these steps, youll not only improve reliability and performance but also enhance security, scalability, and maintainability of your logging infrastructure. Lets begin by understanding why trust in your log pipeline matters more than ever.

Why Trust Matters

In todays digital landscape, logs are the primary source of truth for diagnosing system failures, detecting security breaches, monitoring performance, and ensuring compliance. A single missed log entry could mean the difference between identifying a critical vulnerability before exploitation or discovering a breach after data has been exfiltrated. Similarly, duplicated logs can inflate metrics, skew analytics, and waste storage and processing resources.

Filebeat, while robust, is only as trustworthy as its configuration. Misconfigured inputs, unsecured outputs, or improper file handling can lead to data loss, latency spikes, or exposure of sensitive information. For example, a misconfigured prospecting pattern might cause Filebeat to skip rotated log files, or an unencrypted connection to Elasticsearch could expose logs containing passwords, tokens, or PII.

Trust in Filebeat means confidence that:

  • Every relevant log line is captured without omission.
  • Logs are delivered in the correct order and without duplication.
  • Security policies are enforced end-to-end.
  • Performance remains stable under high load.
  • Configuration changes are version-controlled and auditable.

These are not optional features they are baseline expectations for any production-grade logging system. The 10 methods outlined in this guide are designed to instill that trust. They address the most common failure points and provide clear, actionable steps to ensure Filebeat operates with precision, consistency, and resilience.

Top 10 How to Use Filebeat

1. Use Explicit File Paths with Glob Patterns, Not Wildcards

One of the most common mistakes when configuring Filebeat inputs is using overly broad wildcards like /var/log/*.log. While convenient, this approach can lead to unintended behavior. Filebeat may pick up temporary files, backup files, or logs that are still being written to, resulting in partial or corrupted log entries.

Instead, use explicit glob patterns that match only the intended log files. For example:

filebeat.inputs:

- type: filestream

enabled: true

paths:

- /var/log/app/*.log

- /var/log/app/*.out

- /var/log/app/*.err

By specifying exact extensions and directories, you reduce the risk of including transient or irrelevant files. Combine this with the exclude_files directive to explicitly ignore files like .tmp, .bak, or *.old:

exclude_files: ['\.tmp$', '\.bak$', '\.old$']

This precision ensures Filebeat only processes logs you intend to monitor, improving both accuracy and performance.

2. Enable Log Rotation Handling with Close_Inactive and Close_Removed

Log rotation is a standard practice to prevent disk space exhaustion. Tools like logrotate rename or compress old log files and create new ones. If Filebeat isnt configured to handle this, it can lose track of rotated files or continue reading from closed file descriptors, leading to data loss.

Configure the following settings in your input:

filebeat.inputs:

- type: filestream

enabled: true

paths:

- /var/log/app/*.log

close_inactive: 5m

close_removed: true

close_renamed: true

clean_removed: true

  • close_inactive: Closes a file handle after 5 minutes of inactivity, freeing resources and allowing new files to be picked up.
  • close_removed: Ensures Filebeat stops reading a file if its deleted (e.g., during cleanup).
  • close_renamed: Closes the file when its renamed (e.g., during rotation).
  • clean_removed: Removes the files state from Filebeats registry after deletion, preventing stale state entries.

These settings ensure Filebeat adapts seamlessly to log rotation cycles without missing a single line. Test this with a manual rotation using logrotate -f to verify behavior in your environment.

3. Use the Registry File with Persistent Storage

Filebeat maintains a registry file (default: /var/lib/filebeat/registry) that tracks which lines in which files have already been processed. This prevents duplication and ensures continuity after restarts. However, if the registry file is stored on a volatile filesystem (like tmpfs) or is not backed up, Filebeat may reprocess entire files on restart causing duplicate logs in your destination.

Ensure the registry file is stored on a persistent, reliable disk partition. Verify its location in your configuration:

filebeat.registry.path: /var/lib/filebeat/registry

filebeat.registry.flush: 5s

Use filebeat.registry.flush to control how often Filebeat writes state to disk. A value of 5 seconds balances performance and durability. Avoid setting it too low (e.g., 100ms), as this can cause excessive I/O under high log volume.

Additionally, back up the registry file periodically or include it in your systems backup strategy. If you migrate Filebeat to a new host, copy the registry file along with the configuration to maintain continuity.

4. Configure Output with Retry Logic and Backoff

Network outages, Elasticsearch downtime, or Kafka broker failures can disrupt log delivery. Filebeats default behavior is to retry failed deliveries, but without proper tuning, it may overwhelm your destination or lose logs during prolonged outages.

Optimize your output configuration with these settings:

output.elasticsearch:

hosts: ["https://elasticsearch.example.com:9200"]

username: "filebeat_internal"

password: "securepassword123"

index: "filebeat-%{[agent.version]}-%{+yyyy.MM.dd}"

timeout: 30s

max_retries: 5

backoff: 1s

backoff_factor: 2

bulk_max_size: 50

compression_level: 6

  • max_retries: Limits the number of retry attempts before giving up (5 is a safe default).
  • backoff and backoff_factor: Define exponential backoff starting at 1 second and doubling each retry (1s, 2s, 4s, 8s, 16s).
  • bulk_max_size: Controls how many events are sent per bulk request. 50 is optimal for most environments; increase only if your network is highly reliable.
  • compression_level: Reduces bandwidth usage without significant CPU overhead.

For Kafka outputs, use similar retry and backoff settings. Never disable retries entirely. Even a 10-minute outage should not result in lost logs Filebeats built-in persistence and retry logic ensures logs are held locally until delivery succeeds.

5. Enable TLS/SSL for All Output Connections

Transmitting logs in plaintext exposes sensitive data to interception, especially over public or untrusted networks. Filebeat supports TLS encryption for all output types Elasticsearch, Logstash, Kafka, and even file outputs.

Always enable TLS when sending logs to remote destinations:

output.elasticsearch:

hosts: ["https://elasticsearch.example.com:9200"]

ssl.enabled: true

ssl.certificate_authorities: ["/etc/pki/tls/certs/ca.crt"]

ssl.certificate: "/etc/pki/tls/certs/filebeat.crt"

ssl.key: "/etc/pki/tls/private/filebeat.key"

For Logstash:

output.logstash:

hosts: ["logstash.example.com:5044"]

ssl.enabled: true

ssl.certificate_authorities: ["/etc/pki/tls/certs/ca.crt"]

Use certificates signed by a trusted Certificate Authority (CA). Avoid self-signed certificates in production unless you explicitly configure Filebeat to trust them via ssl.certificate_authorities.

Additionally, enforce TLS 1.2 or higher by adding:

ssl.supported_protocols: [TLSv1.2, TLSv1.3]

Regularly rotate certificates and monitor expiration dates using monitoring tools or scripts. Unexpired, trusted TLS connections are non-negotiable for secure log pipelines.

6. Use Processors to Sanitize and Enrich Logs Before Sending

Raw logs often contain sensitive data IP addresses, API keys, user IDs, or stack traces with environment variables. Sending these unfiltered to centralized systems violates compliance standards like GDPR, HIPAA, or PCI-DSS.

Use Filebeats built-in processors to sanitize logs before transmission:

processors:

- drop_fields:

fields: ["password", "token", "api_key", "secret"]

- add_fields:

target: "service"

fields:

environment: "production"

team: "backend"

- rename:

fields:

- from: "source"

to: "file.path"

- decode_json_fields:

fields: ["message"]

target: "json"

overwrite_keys: true

  • drop_fields: Removes sensitive fields entirely.
  • add_fields: Adds contextual metadata like environment or team for easier filtering.
  • rename: Standardizes field names across different sources.
  • decode_json_fields: Parses structured logs embedded in message fields.

Test your processors using filebeat test config and filebeat test output before deployment. Always validate that sensitive data is stripped never assume it is.

7. Monitor Filebeat with Built-in Metrics and External Tools

Filebeat includes a comprehensive metrics endpoint that exposes real-time statistics on events processed, bytes sent, errors, and registry state. Enable it by adding:

monitoring.enabled: true

monitoring.elasticsearch:

hosts: ["https://elasticsearch.example.com:9200"]

username: "monitoring_user"

password: "monitoring_password"

This sends Filebeats internal metrics to Elasticsearch, where they can be visualized in Kibana using the built-in Filebeat Overview dashboard.

Additionally, monitor Filebeat using external tools:

  • Use systemd to track service status: systemctl status filebeat
  • Set up log monitoring for Filebeats own logs (/var/log/filebeat/filebeat) to catch startup failures or permission issues.
  • Use Prometheus + Node Exporter to monitor system resources (CPU, memory, disk I/O) consumed by Filebeat.
  • Implement alerting for high error rates, registry file size anomalies, or prolonged output delays.

Trust in Filebeat requires visibility. Without metrics, youre flying blind. Set up dashboards and alerts as soon as Filebeat is deployed.

8. Run Filebeat as a Non-Root User with Minimal Permissions

Running Filebeat as root is a security risk. If Filebeat is compromised, an attacker gains full system access. Always create a dedicated, low-privilege user:

sudo useradd --system --no-create-home --shell /bin/false filebeat

sudo chown filebeat:filebeat /var/lib/filebeat/registry

sudo chown filebeat:filebeat /var/log/filebeat/

Ensure Filebeat has read access only to the log files it needs:

sudo chmod 644 /var/log/app/*.log

sudo setfacl -R -m u:filebeat:r /var/log/app/

Use ACLs (Access Control Lists) for granular control. Avoid granting write or execute permissions to Filebeat. If Filebeat needs to read compressed logs (e.g., .gz), ensure the decompression tool (like zcat) is accessible, but do not grant Filebeat permission to execute arbitrary commands.

Verify permissions with:

sudo -u filebeat cat /var/log/app/application.log

Never run Filebeat as root. Even in containers, use non-root users in your Dockerfile or Kubernetes securityContext.

9. Use Configuration Management for Version Control and Consistency

Manually editing Filebeat configuration files on dozens or hundreds of servers is error-prone and unsustainable. Use configuration management tools like Ansible, Puppet, Chef, or Terraform to deploy and version-control your Filebeat configs.

Store your Filebeat configuration in a Git repository:

  • filebeat.yml.j2 Jinja2 template for dynamic environments
  • filebeat.service systemd unit file
  • filebeat-roles/ Ansible roles per environment

Example Ansible task:

- name: Deploy Filebeat configuration

template:

src: filebeat.yml.j2

dest: /etc/filebeat/filebeat.yml

owner: root

group: filebeat

mode: '0644'

notify: restart filebeat

Use environment-specific variables to inject values like Elasticsearch hosts, TLS paths, or processor rules. This ensures consistency across dev, staging, and production.

Automate deployment with CI/CD pipelines. After a config change is merged, trigger a rollout to a subset of hosts, validate metrics, then proceed to the rest. This minimizes risk and ensures auditability.

10. Regularly Test and Validate Your Filebeat Pipeline

Configuration drift, log format changes, or upstream application updates can break your Filebeat pipeline without warning. A pipeline that worked yesterday may silently fail today.

Implement a routine validation process:

  1. Test configuration syntax: Run filebeat test config before every deployment.
  2. Test connectivity: Use filebeat test output to verify communication with Elasticsearch, Logstash, or Kafka.
  3. Simulate log generation: Use echo "test log line" >> /var/log/app/test.log and verify it appears in your destination within 10 seconds.
  4. Validate field mapping: Check that enriched fields and JSON parsing work as expected in Kibana Discover.
  5. Monitor for duplicates: Query Elasticsearch for duplicate @timestamp and message combinations.
  6. Check registry integrity: Inspect the registry file for anomalies or excessive size (could indicate stuck files).

Consider writing a simple shell script that automates these checks and runs daily via cron. Integrate it into your monitoring system to trigger alerts if validation fails.

Trust is earned through verification. Never assume Filebeat is working prove it.

Comparison Table

The table below compares the top 10 methods against common pitfalls and their impact on reliability, security, and maintainability.

Method Common Pitfall Impact if Ignored Best Practice Outcome
1. Explicit File Paths Using broad wildcards like /var/log/*.log Captures temporary or backup files, causing noise and duplication Clean, targeted log collection with zero irrelevant entries
2. Log Rotation Handling Not setting close_renamed or close_removed Lost logs during rotation cycles; incomplete data Seamless log rotation with zero data loss
3. Registry File Management Storing registry on tmpfs or not backing it up Full file reprocessing on restart; duplicate logs Persistent state tracking; no duplicates after restarts
4. Output Retry Logic Disabling retries or using default values Log loss during network or destination outages Automatic recovery; logs held locally until delivery
5. TLS/SSL Encryption Sending logs over plaintext connections Exposure of sensitive data; compliance violations End-to-end encrypted log transport
6. Log Sanitization Forwarding raw logs with PII or secrets Data breaches; regulatory fines Compliant, secure log data with sensitive fields stripped
7. Metrics Monitoring No metrics enabled or dashboards configured Silent failures; no awareness of performance degradation Real-time visibility into health and throughput
8. Non-Root Execution Running Filebeat as root System compromise if Filebeat is exploited Reduced attack surface; principle of least privilege
9. Configuration Management Manual edits across servers Inconsistent configs; hard to audit or rollback Version-controlled, repeatable, auditable deployments
10. Regular Validation Assuming it worked yesterday Undetected pipeline failures; false sense of security Proactive detection of issues; continuous trust

Each of these methods addresses a specific risk area. Implementing all 10 transforms Filebeat from a simple log forwarder into a resilient, secure, and enterprise-grade logging component.

FAQs

Can Filebeat lose logs during high traffic?

No, not if configured correctly. Filebeat uses an internal spooler that buffers events in memory and writes them to disk when memory is full. Combined with persistent registry tracking and retry logic, Filebeat can handle bursts of traffic without data loss. However, if the disk fills up or the registry becomes corrupted, delivery may pause. Monitor disk usage and registry file size to prevent this.

Should I use Filebeat or Logstash for log processing?

Use Filebeat for collection and lightweight forwarding. Use Logstash for heavy transformation, filtering, or enrichment. Filebeat is designed to be lightweight and efficient. If you need complex parsing, conditional logic, or database lookups, offload that to Logstash. Filebeat + Logstash is a proven, scalable combination.

How often should I rotate Filebeats registry file?

You should never manually rotate the registry file. Its a binary file that tracks state automatically. Instead, ensure its backed up regularly and stored on persistent storage. If the registry file becomes corrupted, delete it Filebeat will restart from the beginning of all log files. Only do this if youre prepared to accept potential duplicates.

Can Filebeat send logs to multiple outputs at once?

Yes. Filebeat supports multiple output destinations. For example, you can send logs to both Elasticsearch and Kafka simultaneously. Use the loadbalance option to distribute events or use conditional routing with processors. This is useful for redundancy or feeding different systems (e.g., real-time analytics and long-term storage).

Does Filebeat support Docker and Kubernetes?

Yes. Filebeat has built-in autodiscover features for Docker and Kubernetes. You can configure it to automatically detect containers and collect logs from their stdout/stderr or mounted log files. Use the autodiscover section in your config to define templates based on container labels or Kubernetes annotations.

Whats the difference between filestream and log input types?

Filebeat 7.10+ introduced filestream as the newer, recommended input type. It replaces the legacy log input. Filestream offers better performance, improved file handling, and more consistent behavior with log rotation. New deployments should always use filestream. Legacy configurations can continue using log, but migration is strongly advised.

How do I troubleshoot Filebeat if its not sending logs?

Start by checking:

  • Filebeat logs: tail -f /var/log/filebeat/filebeat
  • Configuration syntax: filebeat test config
  • Output connectivity: filebeat test output
  • File permissions: Can the filebeat user read the log files?
  • Registry state: Is the registry file present and writable?
  • Network connectivity: Can the host reach the output destination?

Enable debug logging temporarily with logging.level: debug for detailed diagnostics.

Is Filebeat suitable for real-time log streaming?

Yes. With low latency settings (close_inactive: 1s, scan_frequency: 10s), Filebeat can deliver logs with sub-second delay. Its widely used in real-time monitoring, alerting, and security analytics. For ultra-low-latency requirements (under 100ms), consider integrating with Kafka or using a dedicated streaming agent like Fluent Bit.

Can Filebeat parse multiline logs (e.g., Java stack traces)?

Yes. Use the multiline processor to combine lines that belong to a single event. For example:

processors:

- multiline:

type: pattern

pattern: '^\['

negate: true

match: after

This groups lines that dont start with a bracket (common in Java stack traces) with the previous line. Test multiline patterns carefully incorrect patterns can cause events to be merged incorrectly.

Conclusion

Filebeat is not a black box. Its a powerful, flexible tool that demands thoughtful configuration to perform reliably in production. The top 10 methods outlined in this guide are not suggestions they are foundational practices used by organizations that depend on their logging infrastructure to operate safely, efficiently, and compliantly.

Trust in Filebeat is earned through precision: precise file paths, precise permissions, precise encryption, precise processing, and precise validation. Its not enough to install it and forget it. You must monitor it, test it, secure it, and automate its management.

By following these methods, you transform Filebeat from a simple log collector into a trusted pillar of your observability stack. You reduce risk, eliminate data loss, ensure compliance, and gain confidence that every log line no matter how small is captured, secured, and delivered.

Start with one method. Implement it. Validate it. Then move to the next. Over time, your logging pipeline will become as reliable as the systems it monitors. And in an era where data is the lifeblood of operations, that reliability isnt just valuable its essential.