Skip to main content
Version: 0.10.5

Elasticsearch

Certified

Important Capabilities

CapabilityStatusNotes
Platform InstanceEnabled by default

This plugin extracts the following:

  • Metadata for indexes
  • Column types associated with each index field

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[elasticsearch]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "elasticsearch"
config:
# Coordinates
host: 'localhost:9200'

# Credentials
username: user # optional
password: pass # optional

# SSL support
use_ssl: False
verify_certs: False
ca_certs: "./path/ca.cert"
client_cert: "./path/client.cert"
client_key: "./path/client.key"
ssl_assert_hostname: False
ssl_assert_fingerprint: "./path/cert.fingerprint"

# Options
url_prefix: "" # optional url_prefix
env: "PROD"
index_pattern:
allow: [".*some_index_name_pattern*"]
deny: [".*skip_index_name_pattern*"]
ingest_index_templates: False
index_template_pattern:
allow: [".*some_index_template_name_pattern*"]

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
ca_certs
string
Path to a certificate authority (CA) certificate.
client_cert
string
Path to the file containing the private key and the certificate, or cert only if using client_key.
client_key
string
Path to the file containing the private key if using separate cert and key files.
host
string
The elastic search host URI.
Default: localhost:9200
ingest_index_templates
boolean
Ingests ES index templates if enabled.
Default: False
password
string
The password credential.
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to
ssl_assert_fingerprint
string
Verify the supplied certificate fingerprint if not None.
ssl_assert_hostname
boolean
Use hostname verification if not False.
Default: False
url_prefix
string
There are cases where an enterprise would have multiple elastic search clusters. One way for them to manage is to have a single endpoint for all the elastic search clusters and use url_prefix for routing requests to different clusters.
Default:
use_ssl
boolean
Whether to use SSL for the connection or not.
Default: False
username
string
The username credential.
verify_certs
boolean
Whether to verify SSL certificates.
Default: False
env
string
The environment that all assets produced by this connector belong to
Default: PROD
collapse_urns
CollapseUrns
collapse_urns.urns_suffix_regex
array(string)
index_pattern
AllowDenyPattern
regex patterns for indexes to filter in ingestion.
Default: {'allow': ['.*'], 'deny': ['^_.*', '^ilm-history.*...
index_pattern.allow
array(string)
index_pattern.deny
array(string)
index_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
index_template_pattern
AllowDenyPattern
The regex patterns for filtering index templates to ingest.
Default: {'allow': ['.*'], 'deny': ['^_.*'], 'ignoreCase': ...
index_template_pattern.allow
array(string)
index_template_pattern.deny
array(string)
index_template_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
profiling
ElasticProfiling
profiling.enabled
boolean
Whether to enable profiling for the elastic search source.
Default: False
profiling.operation_config
OperationConfig
Experimental feature. To specify operation configs.
profiling.operation_config.lower_freq_profile_enabled
boolean
Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.
Default: False
profiling.operation_config.profile_date_of_month
integer
Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.
profiling.operation_config.profile_day_of_week
integer
Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.

Code Coordinates

  • Class Name: datahub.ingestion.source.elastic_search.ElasticsearchSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Elasticsearch, feel free to ping us on our Slack.