We collect telemetry data (see docs) for the Meltano CLI, UI, website, MeltanoHub, Newsletter, etc. in order to understand usage patterns. This is primarily done by Snowplow but some events are still tracked by pre-2.0 Meltano versions via Google Analytics. All Telemetry data has limitations at it’s core because it’s easy to disable tracking through the application using the send_anonymous_usage_stats setting or by using other tracker blocking software or by simply running the application without an internet connection.
Snowplow is an open source event data collection platform that we use for collecting telemetry data. Our Snowplow instance is hosted by SnowcatCloud, see the Squared Snowplow README for more details on how the data is received and loaded into Snowflake via Snowpipes.
GA is the legacy event aggregator that we used for telemetry prior to Snowplow. Our implementation has some known flaws including the use of a user_id that is commonly short lived in a Meltano project. This means that some projects look like a new user on every instantiation. Due to this we choose to retrieve the raw events from GA and use project_id as the unique identifier for a Meltano project. This is currently done by pulling the GA data, using Meltano in the squared repo, into Snowflake and analyzing it there.
The goal of telemetry is three-fold:
We will never monetize your telemetry data. This means we will never sell your data or trade your data, nor will we mine your data with the goal of monetizing insights.
The only approved use of Meltano telemetry is to improve the Meltano experience for users of Meltano.
As a company principal, Meltano will make every reasonable attempt to not record any data which could lead to damages or which could compromise sensitive information if leaked.
This section should be used by developers to guide in telemetry development.
Long-running processes should have both a started
and completed
event log. This is to ensure we detect failures when the running Meltano process (or container) may be killed before getting a chance to send a “failed” or “aborted” message at completion time.
The completed
version of an event log should be identical to its corresponding started
event, with the inclusion of the final ending state data.
Unless otherwise approved or required, we do not emit “heartbeat” events. A starting event and completion event is sufficient to achieve our goals.
Anonymization is performed via one-way hash algorithms. We default to MD5 because it generates shorter digests. We may also use SHA256 for any aspects we believe should receive a more robust hashing treatment, or for any aspects which have higher likelihood of hash collision when using MD5.