Filebeat的简介、安装、配置、Pipeline

一. 简介

Filebeat由两个主要组件组成：

Inputs：
- 负责管理harvester并找到所有要读取的文件来源。如果输入类型为日志，则查找器将查找路径匹配的所有文件，并为每个文件启动一个harvester。每个Inputs都在自己的Go协程中运行
- 每个prospector类型可以定义多次
Harvesters：
- 一个harvester负责读取一个单个文件的内容，每个文件启动一个harvester。harvester逐行读取每个文件（一行一行地读取每个文件），并把这些内容发送到输出。在harvester正在读取文件内容的时候，文件被删除或者重命名了，那么Filebeat会续读这个文件。这就有一个问题了，就是只要负责这个文件的harvester没用关闭，那么磁盘空间就不会释放。默认情况下，Filebeat保存文件打开的状态直到close_inactive到达。
- 关闭harvester会产生以下结果：
  - 如果在harvester仍在读取文件时文件被删除，则关闭文件句柄，释放底层资源。
  - 文件的采集只会在scan_frequency过后重新开始
  - 如果在harvester关闭的情况下移动或移除文件，则不会继续处理文件

二. 安装

默认的安装文件路径

Type	Description	Default Location	Config Option
home	Home of the Filebeat installation.		`path.home`
bin	The location for the binary files.	`{path.home}/bin`
config	The location for configuration files.	`{path.home}`	`path.config`
data	The location for persistent data files.	`{path.home}/data`	`path.data`
logs	The location for the logs created by Filebeat.	`{path.home}/logs`	`path.logs`

YUM/RPM

[elastic-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

yum install filebeat-7.4.0

RPM下载地址：https://www.elastic.co/cn/downloads/beats/filebeat

yum localinstall -y filebeat-7*.rpm

安装文件路径

Type	Description	Location
home	Home of the Filebeat installation.	`/usr/share/filebeat`
bin	The location for the binary files.	`/usr/share/filebeat/bin`
config	The location for configuration files.	`/etc/filebeat`
data	The location for persistent data files.	`/var/lib/filebeat`
logs	The location for the logs created by Filebeat.	`/var/log/filebeat`

二进制文件

zip, tar.gz, tgz 压缩格式的二进制安装包，下载地址：https://www.elastic.co/cn/downloads/beats/filebeat

安装文件路径

Type	Description	Location
home	Home of the Filebeat installation.	`{extract.path}`
bin	The location for the binary files.	`{extract.path}`
config	The location for configuration files.	`{extract.path}`
data	The location for persistent data files.	`{extract.path}/data`
logs	The location for the logs created by Filebeat.	`{extract.path}/logs`

Filebeat命令行启动

/usr/share/filebeat/bin/filebeat Commands SUBCOMMAND [FLAGS]

Commands	描述
`export`	导出配置到控制台，包括index template, ILM policy, dashboard
`help`	显示帮助文档
`keystore`	管理secrets keystore.
`modules`	管理配置Modules
`run`	Runs Filebeat. This command is used by default if you start Filebeat without specifying a command.
`setup`	设置初始环境。包括index template, ILM policy, write alias, Kibana dashboards (when available), machine learning jobs (when available).
`test`	测试配置文件
`version`	显示版本信息

Global Flags	描述
`-E "SETTING_NAME=VALUE"`	覆盖配置文件中的配置项
`--M "VAR_NAME=VALUE"`	覆盖Module配置文件的中配置项
`-c FILE`	指定filebeat的配置文件路径。路径要相对于`path.config
`-d SELECTORS`
`-e`
`--path.config`
`--path.data`
`--path.home`
`--path.logs`
`--strict.perms`

示例：

/usr/share/filebeat/bin/filebeat --modules mysql -M "mysql.slowlog.var.paths=[/root/slow.log]" -e
/usr/share/filebeat/bin/filebeat -e -E output.console.pretty=true --modules mysql -M "mysql.slowlog.var.paths=["/root/mysql-slow-sql-log/mysql-slowsql.log"]" -M "mysql.error.enabled=false" -E output.elasticsearch.enabled=false

SystemD启动

systemctl enable filebeat
systemctl start filebeat 
systemctl stop filebeat
systemctl status filebeat
journalctl -u filebeat.service
systemctl daemon-reload
systemctl restart filebeat

Filebeat的SystemD配置文件

$ /usr/lib/systemd/system/filebeat.service
[Unit]
Description=Filebeat sends log files to Logstash or directly to Elasticsearch.
Documentation=https://www.elastic.co/products/beats/filebeat
Wants=network-online.target
After=network-online.target

[Service]
Environment="BEAT_LOG_OPTS=-e"
Environment="BEAT_CONFIG_OPTS=-c /etc/filebeat/filebeat.yml"
Environment="BEAT_PATH_OPTS=-path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat"
ExecStart=/usr/share/filebeat/bin/filebeat $BEAT_LOG_OPTS $BEAT_CONFIG_OPTS $BEAT_PATH_OPTS
Restart=always

[Install]
WantedBy=multi-user.target

Variable	Description	Default value
BEAT_LOG_OPTS	Log options	`-e`
BEAT_CONFIG_OPTS	Flags for configuration file path	`-c /etc/filebeat/filebeat.yml`
BEAT_PATH_OPTS	Other paths	`-path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat`

三. Docker镜像

docker pull docker.elastic.co/beats/filebeat:7.4.0

docker pull filebeat:7.4.0

镜像中的安装文件路径

Type	Description	Location
home	Home of the Filebeat installation.	`/usr/share/filebeat`
bin	The location for the binary files.	`/usr/share/filebeat`
config	The location for configuration files.	`/usr/share/filebeat`
data	The location for persistent data files.	`/usr/share/filebeat/data`
logs	The location for the logs created by Filebeat.	`/usr/share/filebeat/logs`

Kubernetes部署

默认部署到kube-system命名空间
部署类型是Daemonset，会部署到每一个Node上
每个Node上的/var/lib/docker/containers目录会挂载到filebeat容器中

默认Filebeat会将日志吐到kube-system命名空间下的elasticsearch中，如果需要指定吐到其他elasticsearch中，修改环境变量

- name: ELASTICSEARCH_HOST
  value: elasticsearch
- name: ELASTICSEARCH_PORT
  value: "9200"
- name: ELASTICSEARCH_USERNAME
  value: elastic
- name: ELASTICSEARCH_PASSWORD
  value: changeme

curl -L -O https://raw.githubusercontent.com/elastic/beats/7.4/deploy/kubernetes/filebeat-kubernetes.yaml
kubectl create -f filebeat-kubernetes.yaml
kubectl --namespace=kube-system get ds/filebeat

OKD部署

curl -L -O https://raw.githubusercontent.com/elastic/beats/7.4/deploy/kubernetes/filebeat-kubernetes.yaml
修改部署文件
securityContext:
    runAsUser: 0
    privileged: true

oc adm policy add-scc-to-user privileged system:serviceaccount:kube-system:filebeat

四. 配置

Filebeat的配置文件路径：/etc/filebeat/filebeat.yml
配置语法为YAML

配置项	描述	示例
processors.*	Processors配置	processors: - include_fields: fields: ["cpu"] - drop_fields: fields: ["cpu.user", "cpu.system"]
filebeat.modules:	Module配置	filebeat.modules: - module: mysql error: enabled: true
filebeat.inputs:	Input配置	filebeat.inputs: - type: log enabled: false paths: - /var/log/*.log
output.*:	Output配置	output.console: enabled: true
path.*	组件产生文件的位置配置	path.home: /usr/share/filebeat path.data: ${path.home}/data path.logs: ${path.home}/logs
setup.template.*	Template配置
logging.*	日志配置	logging.level: info logging.to_stderr: false logging.to_files: true
monitoring.*	X-Pack监控配置	monitoring.enabled: false monitoring.elasticsearch.hosts: ["localhost:9200"]
http.*	HTTP Endpoint配置	http.enabled: false http.port: 5066 http.host: localhost
filebeat.autodiscover.*	Filebeat自动发现配置
	通用配置
	全局配置项
queue.*	缓存队列设置

全局配置项

配置项	默认值	描述	示例
registry.path	${path.data}/registry	注册表文件的根路径	filebeat.registry.path: registry
registry.file_permissions	0600	注册表文件的权限。Window下该配置项无效	filebeat.registry.file_permissions: 0600
registry.flush	0s		filebeat.registry.flush: 5s
registry.migrate_file			filebeat.registry.migrate_file: /path/to/old/registry_file
config_dir			filebeat.config_dir: path/to/configs
shutdown_timeout	5s		filebeat.shutdown_timeout: 5s

通用配置项

配置项	描述	示例
name		name: "my-shipper"
tags		tags: ["service-X", "web-tier"]
fields		fields: {project: "myproject", instance-id: "57452459"}
fields_under_root	如果该选项设置为true，则新增fields会放在根路径下，而不是放在fields路径下。自定义的field会覆盖filebeat默认的field。	fields_under_root: true
processors	该配置项可配置以下Processors，详见
max_procs

配置示例

# Modules配置项
filebeat.modules:
  - module: system
# 通用配置项
fields:
  level: debug
  review: 1
fields_under_root: false
# Processors配置项
processors:
  - decode_json_fields:
# Input配置项
filebeat.inputs:
  - type: log

# Output配置项
output.elasticsearch:
output.logstash:

五. Input插件类型

Input类型

类型	描述	配置示例
Log	从日志文件中读取每一行	filebeat.inputs: - type: log paths: - /var/log/messages - /var/log/*.log
Stdin		filebeat.inputs: - type: stdin
Container		filebeat.inputs: - type: container paths: - '/var/lib/docker/containers//.log'
Kafka		filebeat.inputs: - type: kafka hosts: - kafka-broker-1:9092 - kafka-broker-2:9092 topics: ["my-topic"] group_id: "filebeat"
Redis		filebeat.inputs: - type: redis hosts: ["localhost:6379"] password: "${redis_pwd}"
UDP		filebeat.inputs: - type: udp max_message_size: 10KiB host: "localhost:8080"
Docker		filebeat.inputs: - type: docker containers.ids: - 'e067b58476dc57d6986dd347'
TCP		filebeat.inputs: - type: tcp max_message_size: 10MiB host: "localhost:9000"
Syslog		filebeat.inputs: - type: syslog protocol.udp: host: "localhost:9000"
s3		filebeat.inputs: - type: s3 queue_url: https://test.amazonaws.com/12/test access_key_id: my-access-key secret_access_key: my-secret-access-key
NetFlow
Google Pub/Sub

六. Output插件类型

类型	描述	配置样例
Elasticsearch		output.elasticsearch: hosts: ["https://localhost:9200"] protocol: "https" index: "filebeat-%{[agent.version]}-%{+yyyy.MM.dd}" ssl.certificate_authorities: ["/etc/pki/root/ca.pem"] ssl.certificate: "/etc/pki/client/cert.pem" ssl.key: "/etc/pki/client/cert.key" username: "filebeat_internal" password: "YOUR_PASSWORD"
Logstash		output.logstash: hosts: ["127.0.0.1:5044"]
Kafka		output.kafka: hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"] topic: '%{[fields.log_topic]}' partition.round_robin: reachable_only: false required_acks: 1 compression: gzip max_message_bytes: 1000000
Redis		output.redis: hosts: ["localhost"] password: "my_password" key: "filebeat" db: 0 timeout: 5
File		output.file: path: "/tmp/filebeat" filename: filebeat #rotate_every_kb: 10000 #number_of_files: 7 #permissions: 0600
Console		output.console: pretty: true
Cloud

七. Processors插件

配置语法

processors:
- if:
    <condition>
  then: 
    - <processor_name>:
        <parameters>
    - <processor_name>:
        <parameters>
    ...
  else: 
    - <processor_name>:
        <parameters>
    - <processor_name>:
        <parameters>

可以再Input中添加Processor

- type: <input_type>
  processors:
  - <processor_name>:
      when:
        <condition>
      <parameters>

条件语法

equals
```
equals:
  http.response.code: 200
```
contains
```
contains:
  status: "Specific error"
```
regexp
```
regexp:
  system.process.name: "foo.*"
```
range：The condition supports lt, lte, gt and gte. The condition accepts only integer or float values.
```
range:
    http.response.code:
        gte: 400
```

network

network:
    source.ip: private
    destination.ip: '192.168.1.0/24'
    destination.ip: ['192.168.1.0/24', '10.0.0.0/8', loopback]

has_fields
```
has_fields: ['http.response.code']
```

or

or:
  - <condition1>
  - <condition2>
  - <condition3>
  ...
-----------------------------
or:
  - equals:
      http.response.code: 304
  - equals:
      http.response.code: 404

and

and:
  - <condition1>
  - <condition2>
  - <condition3>
  ...
-----------------------------
and:
  - equals:
      http.response.code: 200
  - equals:
      status: OK
-----------------------------
or:
 - <condition1>
 - and:
    - <condition2>
    - <condition3>

not

not:
  <condition>
--------------
not:
  equals:
    status: OK

支持的Processors

类型	作用	配置样例
`add_cloud_metadata`
`add_docker_metadata`		processors: - add_docker_metadata: host: "unix:///var/run/docker.sock"
`add_fields`		processors: - add_fields: target: project fields: name: myproject id: '574734885120952459'
`add_host_metadata`		processors: - add_host_metadata: netinfo.enabled: false cache.ttl: 5m geo: name: nyc-dc1-rack1 location: 40.7128, -74.0060 continent_name: North America country_iso_code: US region_name: New York region_iso_code: NY city_name: New York
`add_kubernetes_metadata`		processors: - add_kubernetes_metadata: host: kube_config: ~/.kube/config default_indexers.enabled: false default_matchers.enabled: false indexers: - ip_port: matchers: - fields: lookup_fields: ["metricset.host"]
`add_labels`		processors: - add_labels: labels: number: 1 with.dots: test nested: with.dots: nested array: - do - re - with.field: mi
`add_locale`		processors: - add_locale: ~ processors: - add_locale: format: abbreviation
`add_observer_metadata`
`add_process_metadata`
`add_tags`		processors: - add_tags: tags: [web, production] target: "environment"
`community_id`
`convert`		processors: - convert: fields: - {from: "src_ip", to: "source.ip", type: "ip"} - {from: "src_port", to: "source.port", type: "integer"} ignore_missing: true fail_on_error: false
`decode_base64_field`
`decode_cef`
`decode_csv_fields`
`decode_json_fields`
`decompress_gzip_field`
`dissect`		processors: - dissect: tokenizer: "%{key1} %{key2}" field: "message" target_prefix: "dissect"
`dns`
`drop_event`		processors: - drop_event: when: condition
`drop_fields`		processors: - drop_fields: when: condition fields: ["field1", "field2", ...] ignore_missing: false
`extract_array`		processors: - extract_array: field: my_array mappings: source.ip: 0 destination.ip: 1 network.transport: 2
`include_fields`		processors: - include_fields: when: condition fields: ["field1", "field2", ...]
`registered_domain`
`rename`		processors: - rename: fields: - from: "a.g" to: "e.d" ignore_missing: false fail_on_error: true
`script`
`timestamp`

八. 采集注册文件解析

采集注册文件路径：/var/lib/filebeat/registry/filebeat/data.json

[{"source":"/root/mysql-slow-sql-log/mysql-slowsql.log","offset":1365442,"timestamp":"2019-10-11T09:29:35.185399057+08:00","ttl":-1,"type":"log","meta":null,"FileStateOS":{"inode":2360926,"device":2051}}]

source                # 记录采集日志的完整路径
offset                # 已经采集的日志的字节数;已经采集到日志的哪个字节位置
timestamp            # 日志最后一次发生变化的时间戳
ttl                    # 采集失效时间，-1表示只要日志存在，就一直采集该日志
type:                 
meta
filestateos            # 操作系统相关
　　inode              # 日志文件的inode号
　　device          # 日志所在磁盘的磁盘编号

硬盘格式化的时候，操作系统自动将硬盘分成了两个区域。

一个是数据区，用来存放文件的数据信息

一个是inode区，用来存放文件的元信息，比如文件的创建者、创建时间、文件大小等等

每一个文件都有对应的inode，里边包含了与该文件有关的一些信息，可以用stat命令查看文件的inode信息

> stat /var/log/messages
  File: ‘/var/log/messages’
  Size: 56216339        Blocks: 109808     IO Block: 4096   regular file
Device: 803h/2051d      Inode: 1053379     Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-10-06 03:20:01.528781081 +0800
Modify: 2019-10-12 13:59:13.059112545 +0800
Change: 2019-10-12 13:59:13.059112545 +0800
 Birth: -

filebeat简介安装配置