- Excludes the netty-transport-native-epoll dependency, which isn't handled correctly by Spark's --packages support.
- Still too many dependencies excluded. Please use 1.4.4 instead.
- Clean up dependencies to avoid evictions. This resolves issues on Databricks where some evicted dependencies were loaded, which were incompatible with the versions of the dependencies that should have been used.
We excluded too many dependencies in this release. Please use 1.4.2 instead.
- Clean up dependencies to avoid evictions.
- Metadata values are no longer silently truncated to 512 characters.
- Deletes are now supported for
datapoints. See README.md for examples.
- An incorrect version was used for one of the library dependencies.
Although not breaking for most users, this release updates some core dependencies to new major releases. In particular, it is therefore not possible to load 1.3.x releases at the same time as 0.4.x releases.
Sequences are now supported, see README.md for examples using
Files now support upsert, delete, and several new fields like
dataSetIdhave been added.
Files now supports parallel retrieval.
- Improved error message when a column has a incorrect type
- Filter pushdown can now handle null values in cases like
p in (NULL, 1, 2).
- Asset hierarchy now handles duplicated root parentExternalId.
- NULL fields in metadata are ignored for all resource types.
- Improve data points read performance, concurrently reading different time ranges and streaming the results to Spark as the data is received.
- GZip compression is enabled for all requests.
"name" is now optional for upserts on assets when external id is specified and the asset already exists.
More efficient usage of threads.
- Reimplement draining the read queue on a separate thread pool.
- Include the latest data point when reading aggregates. Please note that this is a breaking change and that updating to this version may change the result of reading aggregated data points.
Data points are now written in batches of 100,000 rather than 1,000.
The error messages thrown when one or more columns don't match will now say which columns have the wrong type.
Time series delete now supports the
Assets now include
Schema for RAW tables will now correctly be inferred from the first 1,000 rows.
Release threads from the threadpool when they are no longer going to be used.
- Fixes a bug where not all data points would be read if a time series had less than 10,000 data points per 300 days.
dataSetIdcan now be set for asset hierarchies.
Metrics are now reported for deletes.
- Empty updates of assets, events, or time series no longer cause errors.
assethierarchynow supports metrics.
- Upserts are now supported when using
dataSetIdcan now be set for events, assets, and time series.
useLegacyNameoption now supports setting
.option("useLegacyName", "externalId")to enable this.
A new option
projectallows the user to specify the CDF project to use. If omitted, the project will be fetched using the
A new resource type
assethierarchyis now supported, allowing you to create asset hierarchies from Spark data frames. See the README for more information.
- Uses Cognite Scala SDK version 1.1.2, with further improved functionality to retry requests.
- Fixes a bug where the aggregations
discreteVariancecould not be read due to case errors.
- Java ConnectionException errors will now be retried, improving the robustness of the Spark data source.
- Multiple rows with the same
externalIdare now allowed for upserts, but the order in which they are applied is undefined and we currently only guarantee that at least one upsert will be made for each
externalId, and at least one update will be made for each
idset. This is based on the assumption that upserts for the same
externalIdwill have the same values. If you have a use case where this is not the case, please let us know.
- We now limit the number of threads being used for HTTP connections. In some cases it was possible to use too many threads for HTTP connections, and run out of ephemeral ports.
useLegacyNameoption for time series is now respected also when doing upserts.
Upserts can now be done by internal id.
Metrics are now collected for inserts and updates.
Added support for the time series fields
isStepwhen doing upserts.
- Fixed a bug where certain resources could not write to other tenants than the main CDF tenant.
RAW tables now respects the
baseUrloption for writes.
String data points now respects the
baseUrloption for writes.
Support new option
ignoreUnknownIdsfor asset and event deletes. Assets and events will ignore existing ids on deletes. The default value is true. Use
.option("ignoreUnknownIds", "false")to revert to the old behavior, where the job will be aborted when an attempt to delete an unknown id is made.
Use Cognite Scala SDK version 1.1.0
- Fetch data points at the end of the available count aggregates, even if they are not ready yet. This will fetch all data points even if the last aggregates claim there are no data points available. Some edge cases may still not have been properly addressed yet.
- Use Cognite Scala SDK version 1.0.1
stringdatapointsnow supports save mode.
Increased the default number of partitions from 20 to 200.
stringdatapointsnow correctly fetches all data points.
Fixed a bug in pushdown implementation that would cause no filters to be pushed down when combining filters on pushdown and non-pushdown fields.
datapointswill no longer fail when aggregates aren't ready in CDF yet.
datapointsshould now retrieve all aggregates. Previously it could miss some aggregates due to a rounding error.
stringdatapoints only retrieves the first 100000 data points.
This will be fixed in the next release.
datapoints is fixed in this release.
assetsresource type now has
datapointswill now retrieve all numerical data points again.
maxRetriesoption to allow configuration of the number of retries to attempt.
timeseriesnow supports parallel retrieval.
timeseriesdoes filter pushdown for name, unit, isStep, and isString columns.
datapointsuses count aggregates for improved performance when retrieving numerical data points.
The library has been renamed to "cdf-spark-datasource" instead of "cdp-spark-datasource".
unithave been removed from the data points schema. They were only used for reads.
Failed requests will be retried when appropriate failures are detected.
You can set
- All schemas updated to match API v1
Parallel retrieval is now a lot faster, and parallelity can be specified using the
All datetime columns are now Timestamps rather than milliseconds since Epoch.
Format has been shortened, for convenience:
Filtering Time Series on
assetIdis now applied API-side.
Fixed a bug with time series upsert where
insertInto would only work under special conditions.
Assets will now do upsert when the
When filtering Events on Ids the filter is now applied API-side
Filter pushdown with
ORclauses has been optimized.
- Metadata keys with null values are now removed, avoiding NullPointerExceptions from the API.
Filters are now pushed to CDF when possible for assets, events, files and RAW tables.
RAW tables now expose a
lastUpdatedTimecolumn, and filters for it are pushed to CDF.
Better error messages for invalid
An error is now thrown if trying to update with null as id.
- Infer schema limit for RAW is now being used again.
Support for deleting time series, events and assets with
x-cdp-sdkheader for all API calls.
- Speed up time series and events by avoiding unions.
Support Scala 2.12.
New write mode using
.save()allows specifying behaviour on conflicts.
Partial updates now possible for assets, events and time series.
Assets now support asset types.
Bearer tokens can now be used for authentication.
lastUpdatedTimeto be "null" on inserts.
Allow time series id to be null on insert, and always attempt to create the time series if id is null.
Fix upserts on time series metadata with security categories.
Improved error messages when upserts fail.
Avoid registering the same Spark metric name more than once.
Creating events works again.
Metadata values are truncated to 512 characters, which is now the limit set by Cognite.
Filters on "type" and "subtype" columns of
eventswill be used to retrieve only events of matching type and subtype.
Parallel cursors are used for reading
String data points are now supported using the
First and last data points available will be used to set timestamp limits if not given, improving the performance of
datapointsparallelization for most use cases.
- Writes for non-data points resource types work again.
All fields for all resource types should be present. In particular, many asset fields were previously not included.
Upsert is now supported for time series metadata, based on the time series id.
partitionscan be used to control the partitions created for the
datapointsresource type. The time interval will be split into the given number of partitions and fetched in parallel.
datapointswrites work again.
- Fixed dependencies in .jar, removed "fat" jar from release.
- Fix for
3dmodelrevisionmappings(treeIndex and subtreeSize are optional).
baseUrloption to use a different prefix than https://api.cognitedata.com for all Cognite Data Platform API calls.
- Read-only support for files metadata.
- Initial read-only support for 3D data (should be considered an alpha feature, may not work).
- Breaking change
- Validation of
keycolumn for RAW tables, null values are not allowed.
- Improved performance for assets.
- Retries on error code 500 responses.
maxRetriesoption for all resource types to set the number of retries.
- Improved back off algorithm for retries.
projectis no longer a necessary option, it will be retrieved from the API key.