The Data Source Sync system is a mechanism for updating metadata from a remote data source. There are currently two types of data sources supported: database (via JDBC), and XML. The sync system is controlled by a single configuration file, called syncedMetadata.xml, which is located in the .../netx/config directory of the NetX installation. Sync events are logged in exogen.log.
Sync tasks
There are currently three different types of sync tasks that the system can perform: scalar, multivalue, and valuelookup.
Scalar Sync Task
This type of task updates the metadata for a group of assets, categories, or views. The incoming data is linked to attributes in the DAM on one or more fields. A search is done for the assets, categories or views to update for each row of incoming data (either directly from a database, or from XML), based on the linked fields. If one or more objects are found that match each row, the metadata for those objects are updated from that data row.
Multivalue Sync Task
This type of task updates the group of values in a multivalued attribute (i.e. a pulldown or tag.) Note that this type of sync task doesn't update specific assets, just updates the group of valid values for a particular attribute.
Valuelookup Sync Task
This type of task updates a single attribute of a single asset, category, or view. It is useful for setting initial values on linked attributes when new assets are imported into the system.
Scalar Sync Type
When you set up the sync system, you will define one or more sync tasks. Each task defines a set of source data, a set of destination objects, and how the source data is to be matched to the destination objects. You can define several different types of sync tasks.
Lets look at a simple example:
<?xml version="1.0" encoding="UTF-8" ?>
<metadatasync version="1.0">
<sync type="scalar" log="debug" name="UpdateTeams">
<source name="DB1" type="database" jdbc="jdbc:mysql://127.0.0.1/synced_metadata" query="SELECT * FROM teams" username="sa" password="secret">
<field name="teamId" column="id"/>
<field name="teamName" column="name"/>
<field name="teamHome" column="teamhome"/>
<field name="league" column="league"/>
<field name="conference" column="conference"/>
<field name="division" column="division"/>
<callback type="success" sql="UPDATE teams SET synced = 1 WHERE id = %1" param1="teamId" frequency="row"/>
<callback type="failure" sql="UPDATE teams SET sync_failed = 1 WHERE id = %1" param1="teamId" frequency="row"/>
</source>
<destination type="DAM">
<records name="Hockey teams" type="asset">
<link field="teamId" attribute="Team Id" />
<map field="conference" attribute="Conference"/>
<map field="division" attribute="Division"/>
<map field="teamHome" attribute="Team Home"/>
<map field="league" attribute="League"/>
<map field="teamName" attribute="Team Name"/>
<setvalue attribute="Games Won" value="0"/>
</records>
</destination>
</sync>
</metadatasync>
This configuration file defines a single sync task, named “UpdateTeams”. Each sync task must contain a single source and a single destination.
In this example, we're going to get data from a database table and use it to update the metadata for some assets in our DAM. We're going to get data from the teams table in our database, which looks something like this:
In the source tag, we set the JDBC URL to connect to the database, as well as the username and password to use. The SQL query is also specified that will return the rows of data that we want to use. This can be any query that returns a result set from the database, including calling a stored procedure that returns a result set.
Inside the source tag, we define the fields that we want to take from the result set in a set of field tags. These are the values that will be available in the destination. The column attribute of the field tag contains the column name as returned by the SQL query, and the name attribute is how you will refer to this field in the rest of the configuration file (i.e. this is a name you specify, essentially a variable; it can be the same as the column name if you want.)
The last items defined inside the source tag of this example are two callbacks. Callbacks are available in the database data source type, and will be called upon the failure or success of the data update. A SQL query is defined, as well as the callback type (success or failure). Up to five parameters may be sent to the callbacks, specified inside the query as %1 - %5. The param1 – param5 attributes contain the fields used for the parameters (these must be fields defined in a field tag.) The frequency attribute controls whether the callback is called for each row, or once after the entire data set has been processed.
Currently, the only valid type of destination is DAM. There could be others defined in the future. Each destination tag contains one or more records tags. The records tag defines the mapping between database fields and attributes, as well as specifies how each row of source data will be matched to assets, categories, or views. It also specifies which type of attributes to update (asset, category, or view).
The link tag defines how source data rows are matched to assets. There can be multiple link tags, but there must be at least one link tag. This tag has two attributes, a field attribute that will specify one of the fields defined in the data source (must match value set for name in the field tag), and an attribute attribute that names the metadata attribute in the DAM that the field will be matched to. It is important to note that it is possible for more than one asset to match each row of data in the source data (it is also possible that 0 assets will match.)
In the above example, the id column in the database is linked to the Team ID attribute through the teamId field . Lets say that the data source contains five rows, like these:
When the data is processed, each row will be processed in turn. So starting with the first row, the sync system will search for all assets which have the value “35” in their Team ID attribute. If eight assets are found with this value, then all eight assets will be updated from the first data row.
We define the attributes which will be updated in these eight assets with the map tags. Each map tag maps a field to an attribute in the DAM. So for each of the eight matching assets, the attributes named in the map tags will be updated to the matching values in the source data row.
Additionally, attributes can be initialized with the setvalue tag. This sets a specified attribute to an explicit value defined in the tag (see above example).
Once all eight assets have been updated, and any appropriate callbacks have been called, the next row will be processed (the one where id = 36.)
A sync task can contain a third type of tag, which is the value_map tag. This allows you to map discrete values in the source data to new values in the destination data. For example, you might define the following value map:
<value_map name="league_map">
<map_value from="WHL" to="Western Hockey League"/>
<map_value from="NHL" to="National Hockey League"/>
<map_value from="OHL" to="Ontario Hockey League"/>
<map_value from="QMJHL" to="Quebec Major Junior Hockey League"/>
<map_value from="CHL" to="Canadian Hockey League"/>
</value_map>
<destination type="DAM">
<records name="Example" type="asset">
<link field="teamId" attribute="Team Id" />
<map field="league" attribute="League Name" value_map="league_map"/>
</records>
</destination>
In this example, values in the source data will be checked to see if they match anything in the value map, with new values being substituted if they match, before updating the destination attribute. You can define multiple mappings within a value map and each value map relates only the task in which it's defined. When configuring the destination for the data, you can then also specify which value map you want to use for each mapped attribute (express within the map element, as illustrated in the example above).
Multivalue Sync Type
The multivalue sync type allows you to update a pulldown or tag attribute with a new set of allowed values (i.e. update the domain for the attribute.) Let's have an example!
Suppose you have a pulldown attribute that displays a list of hockey league acronyms:
Now let us suppose that you have a database table with hockey league information, with this data:
You might like to periodically update the available list of leagues in your pulldown attribute from the data in the table, so that when new leagues are added, they will appear in the pulldown, and when leagues are removed, they are removed from your pulldown.
This is how to set up a multivalue sync type:
<sync type="multivalue" log="debug" name="UpdateLeagues">
<source name="DB1" type="database" jdbc="jdbc:mysql://127.0.0.1/synced_metadata" query="SELECT league_name_short FROM leagues WHERE visible = 1 ORDER BY league_name_short" username="sa" password="secret">
<field name="leagueNames" column="league_name_short"/>
<callback type="success" sql="UPDATE leagues SET synced = 1" frequency="end"/>
<callback type="failure" sql="UPDATE leagues SET sync_failed = 1" frequency="end"/>
</source>
<destination type="DAM">
<records name="Hockey leagues" type="asset">
<map field="leagueNames" attribute="League"/>
</records>
</destination>
</sync>
This database query will produce a single-column result set with multiple values. These values will be used to set the list of values allowed in the pulldown mentioned in the destination records block.
Value Lookup
A third type of sync task is available, which allows you to update a single asset with values looked up from the data source. This is useful in situations where you need to set an id field on a newly imported asset so that subsequent sync operations will update that asset.
This type of sync task must be triggered by an AutoTask, because it must have context information available, which is supplied by the AutoTask. The following is an example of a value lookup task definition.
<sync type="valuelookup" log="debug" name="GetTeamID">
<source name="Hockey DB" type="database" jdbc="jdbc:mysql://127.0.0.1/synced_metadata" query="SELECT getTeamId(%1, %2) AS team_id" username="sa" password="secret">
<param type="ASSET_FILE_BASENAME" position="1"/>
<param type="CATEGORY_NAME" position="2"/>
<field name="TeamID" column="team_id"/>
</source>
<destination type="DAM">
<records name="Imported Asset" type="asset">
<map field="TeamID" attribute="Team Id"/>
<setvalue attribute="Games Won" value="0" valueLookup="success"/>
</records>
</destination>
</sync>
The param elements in the data source definition use context data supplied by the AutoTask to build the query used to look up the desired data. Please note: the query used for this sync task type should result in a single row with a single column of data, i.e. a single value with which to update a single attribute on a specific asset. The type attribute of the param element names the context data item to be used, and the position attribute specifies the location in the query where that item should be substituted (if position is “1”, then the string “%1” will be replaced with that value in the query.)
These are the context values which are defined:
Name |
Description |
---|---|
ASSET_FILENAME |
Full asset file name |
ASSET_FILE_BASENAME |
Asset file name without extension |
ASSET_FILENAME_EXTENSION |
Asset file name extension |
ASSET_ID |
Asset ID |
ASSET_ATTRIBUTE |
Value of named attribute on the asset |
CATEGORY_NAME |
Name of category |
CATEGORY_PATH |
Full path name of category |
CATEGORY_ID |
Category ID |
CATEGORY_ATTRIBUTE |
Not currently implemented |
USER_ID |
ID of user who triggered the autotask |
DATASET_ID |
ID of dataset which triggered the autotask |
Placeholders in the SQL query should NOT be wrapped in quotes. When the placeholders are substituted with data, the type of the data is taken into account and wrapped with quotes if necessary.
The autotask job for using a ValueLookup type task is different than the one for other tasks, because you must specify which task is the one that looks up the ID value, and then list the other tasks you want to run using the new ID. Here is an example of an autotask configuration for ValueLookup:
<autotask>
<task name="ValueLookupJob Task">
<matchCriteria type="or">
<criteria type="action" value="import" />
<criteria type="action" value="resync" />
</matchCriteria>
<customJob idSyncName="lookupIdTask" initMetadata="task1,task2,task3" className="com.netxposure.products.imageportal.module.metadata.sync.job.ValueLookupJob"/>
</task>
</autotask>
Filters
There is a filtering mechanism to limit the set of assets updated by a particular sync task. There are currently two types of filters available. The first is a Solr query filter, where a Solr query is specified that limits the assets that can be matched to a specific row of data. In terms of sets, the set of assets that will be updated if there is a Solr query filter is the intersection of the Solr query result set and the set of assets that match on the linked fields for that data row.
Filters are defined in the records block. For example, for a particular sync task, you may want to limit the assets updated to the specific assets where the league = “NHL”:
<destination type="DAM">
<records name="Hockey teams" type="asset">
<filter type="solrquery" query="league:NHL"/>
<link field="teamId" attribute="Team Id" />
<map field="conference" attribute="Conference"/>
<map field="division" attribute="Division"/>
<map field="teamHome" attribute="Team Home"/>
<map field="league" attribute="League"/>
<map field="teamName" attribute="Team Name"/>
<map field="league" attribute="League Name" value_map="league_map"/>
</records>
</destination>
The other type of filter is a context filter. This type of filter will only work if the sync task is being run from an AutoTask where there is context information available (i.e. an AutoTask triggered by import/resync/reimport, etc.) This type of filter is useful, for example, if you need to give a newly imported asset initial values for a set of its attributes. For example:
<sync type="scalar" log="debug" name="InitialValues" include_in_default="false">
<source name="DB1" type="database" jdbc="jdbc:mysql://127.0.0.1/synced_metadata" query="SELECT * FROM teams WHERE id = %1" username="sa" password="secret">
<param type="ASSET_ATTRIBUTE" position="1" attribute="Team Id" datatype="integer"/>
<field name="teamId" column="id"/>
<field name="teamName" column="name"/>
<field name="teamHome" column="teamhome"/>
<field name="league" column="league"/>
<field name="conference" column="conference"/>
<field name="division" column="division"/>
</source>
<value_map name="league_map">
<map_value from="WHL" to="Western Hockey League"/>
<map_value from="NHL" to="National Hockey League"/>
<map_value from="OHL" to="Ontario Hockey League"/>
<map_value from="QMJHL" to="Quebec Major Junior Hockey League"/>
<map_value from="CHL" to="Canadian Hockey League"/>
</value_map>
<destination type="DAM">
<records name="Imported Asset Initial Values" type="asset">
<filter type="context">
<context type="ASSET_ID" />
</filter>
<link field="teamId" attribute="Team Id" />
<map field="conference" attribute="Conference"/>
<map field="division" attribute="Division"/>
<map field="teamHome" attribute="Team Home"/>
<map field="league" attribute="League"/>
<map field="teamName" attribute="Team Name"/>
<map field="league" attribute="League Name" value_map="league_map"/>
</records>
</destination>
</sync>
This task is designed to set initial values for a set of attributes after an asset has been imported. You would want to run this after a value lookup sync task to make sure that the Team Id attribute was given the correct value. The context filter ensures that the only asset updated by this task is the one with the asset id given by the context. ASSET_ID is the only context value supported currently, but this is easily expanded to support all of the context values that are availabe as query parameters.
Note the use of the ASSET_ATTRIBUTE parameter to the SQL query. For this parameter type, the name of the attribute must be given in the attribute XML attribute, and you must specify a data type in the datatype attribute so that the query can be formatted properly.
Note that only one filter per records block is allowed.
Callbacks
There are currently two types of callbacks supported. The first type is a database datasource callback, where a query can be executed on the database during or after a sync task. The other type of callback is a mechanism where a job can be run in the DAM during or after the sync task. Both types of callbacks can be configured to either run if the sync task is successful or fails. Multiple callbacks can be defined, so you can define a success callback and a failure callback, or any other set of callbacks you need.
Database Callbacks
Here is an example of database callbacks:
<source name="DB1" type="database" jdbc="jdbc:mysql://127.0.0.1/synced_metadata?useUnicode=true&characterEncoding=utf8" query="SELECT * FROM teams" username="user" password="password">
<field name="teamId" column="id"/>
<field name="teamName" column="name"/>
<field name="teamHome" column="teamhome"/>
<field name="league" column="league"/>
<field name="conference" column="conference"/>
<field name="division" column="division"/>
<callback type="success" sql="UPDATE teams SET synced = 1 WHERE id = %1" param1="teamId" frequency="row"/>
<callback type="failure" sql="UPDATE teams SET sync_failed = 1 WHERE id = %1" param1="teamId" frequency="row"/>
<callback type="success" sql="INSERT INTO sync_journal (result, synctime) VALUES ('succeeded', NOW())" frequency="end"/>
<callback type="failure" sql="INSERT INTO sync_journal (result, synctime) VALUES ('failed', NOW())" frequency="end"/>
</source>
Parameter substitution is available in the callback query. Allowed parameters are the field names defined in the field elements. Up to five parameters may be defined. Each param should be set with a param attribute, param1 - param5. The symbols %1 - %5 are used in the query to determine where the substitution should take place. The callback system determines whether the value needs to be quoted or not, so please do not include quotes in the query.
Database callbacks can be defined to run either after every row in the data source is processed, or at the end of the task after all rows are processed. Parameter substitution will not work in callbacks that run at the end of the sync task, since they are not executed in the context of a specific row of data.
DAM Callback Jobs
DAM callback jobs are jobs that are run during or after a sync task has completed. These jobs must be classes that inherit from the com.netxposure.products.imageportal.module.metadata.sync.job.callback.CallbackJob class. The jobs can be configured to run at the end of the sync task, after each row of source data is processed, or after each asset has been updated (in this case a job may be executed more than once per source data row, if that row matches multiple assets in the DAM.)
Placeholders
It is possible to create an asset placeholder if no assets match a specific row of data. The placeholder will be given the metadata as mapped for that particular data row, and will be named using the value in a specific source field. Here is an example of how to configure placholders:
<records name="Hockey teams" type="asset">
<link field="teamId" attribute="Team Id" />
<map field="conference" attribute="Conference"/>
<map field="division" attribute="Division"/>
<map field="teamHome" attribute="Team Home"/>
<map field="league" attribute="League"/>
<map field="teamName" attribute="Team Name"/>
<placeholder create="true" placeholderCategoryId="23" nameField="teamName"/>
</records>
A specific category id must be given for the placeholders to be created in.
Placeholders may only be configured for record blocks of type asset, and may only be used in scalar sync types.
XML Data Sources
In addition to using a database as a data source, the system currently supports using XML files as data sources. Two basic forms of XML data are supported. XML with element-contained data, and XML with attribute-contained data. An Xpath expression must be given to grab rows of data from the XML.
XML with element-contained data looks like this:
<?xml version="1.0"?>
<data>
<rows>
<row>
<id>1</id>
<teamhome>Anaheim</teamhome>
<name>Ducks</name>
<league>NHL</league>
<conference>Western</conference>
<division>Pacific</division>
<wins>54</wins>
<losses>20</losses>
<overtime_losses>8</overtime_losses>
<shootout_losses>0</shootout_losses>
</row>
<row>
<id>2</id>
<teamhome>Boston</teamhome>
<name>Bruins</name>
<league>NHL</league>
<conference>Eastern</conference>
<division>Atlantic</division>
<wins>54</wins>
<losses>19</losses>
<overtime_losses>9</overtime_losses>
<shootout_losses>0</shootout_losses>
</row>
</rows>
</data>
XML with attribute-contained data looks like this:
<?xml version="1.0"?>
<data>
<rows>
<row id="1" teamhome="Anaheim" name="Ducks" league="NHL" conference="Western" division="Pacific" wins="54" losses="20" overtime_losses="8" shootout_losses="0"/>
<row id="2" teamhome="Boston" name="Bruins" league="NHL" conference="Eastern" division="Atlantic" wins="54" losses="19" overtime_losses="9" shootout_losses="0"/>
</rows>
</data>
Here is an example of a task definition to sync with XML:
<sync type="scalar" log="debug" name="XML Metadata sync" include_in_default="true">
<source name="Team XML" type="xml" filepath="/opt/netx/xml/teams.xml" category_id="32" row_xpath="//row" container_type="attributes" delete_data_file="false" file_to_process="newest">
<field name="teamId" column="id"/>
<field name="teamName" column="name"/>
<field name="league" column="league"/>
<field name="conference" column="conference"/>
<field name="division" column="division"/>
</source>
<destination type="DAM">
<records name="New Team Data" type="asset">
<link field="teamId" attribute="Team Id" />
<map field="conference" attribute="Conference"/>
<map field="division" attribute="Division"/>
<map field="league" attribute="League"/>
<map field="teamName" attribute="Team Name"/>
<map field="league" attribute="League Name"/>
</records>
</destination>
</sync>
The container_type element specifies which format the XML file uses, and the value should either be “attributes” or “elements”. The row_xpath attribute contains the Xpath expression used to retrieve data rows.
There are two options for the location of the XML file. The file can be somewhere in the filesystem, in which case the path to the file must be given in the filepath attribute. Alternatively, the XML file can be an asset in the DAM. In this case, the category_id attribute must have the id of the category containing the file, and the file_to_process attribute must specify which file to process. The options are “newest”, in which case the most recently imported asset is used as the source XML file, or “all”, in which case all contents of the category are treated as XML and used to sync.
If the file should be deleted after the sync, the delete_data_file attribute should be set to true (this does not apply to assets, only to XML files contained in the filesystem.)
Pushing Data From NetX
It is possible to sync data FROM NetX to an external data destination. Currently supported data destination types are XML files and an external database via JDBC.
There are three options for choosing which data to use in a sync task where NetX is the data source: by DAM folder, based on the value of an attribute, or based on the modification date. Here are examples of all three data source configurations:
<!-- ISSUE/NOTE: Only "scalar" sync types make sense for DAM==>JDBC/XML sync -->
<sync type="scalar" log="debug" name="UpdateTeamRecords">
<!-- This will always dump metadata for all assets in the specific category (must use either categoryName or categoryId, but not both) -->
<source type="DAM" name="netx dam" objectType="asset" category_id="99" categoryName="some/category/name">
<!-- NOTE: In this context "column" refers to the attribute name -->
<!-- NOTE: The following "columns" are also available in most contexts: -->
<!-- asset.assetId -->
<!-- asset.name -->
<!-- asset.description -->
<!-- asset.attribute.attribute_name (this is not needed if the objectType is "asset" since any names will be interpreted as asset attribute names) -->
<field name="field.teamId" column="Team ID"/>
<field name="field.teamName" column="Team Name"/>
<field name="field.teamCity" column="Team City"/>
</source>
<sync type="scalar" log="debug" name="AddNewTeamsWhenAttributeChanges">
<!-- This will always dump metadata for all assets where the attribute "Send To External Database" is set to "true" -->
<source type="DAM" name="netx dam" objectType="asset" attributeName="Send To External Database" attributeValue="true">
<field name="field.teamId" column="Team ID"/>
<field name="field.teamName" column="Team Name"/>
<field name="field.teamCity" column="Team City"/>
</source>
<sync type="scalar" log="debug" name="AddNewTeamsByDate">
<!-- This will always dump metadata for all assets where the asset mod date has been update in the last 2 days -->
<!-- s - seconds -->
<!-- m - minutes -->
<!-- h - hours -->
<!-- d - days -->
<!-- w - weeks -->
<!-- M - months -->
<!-- These MUST appear in the following order in the string: Mwdhms -->
<!-- The period string may contain spaces between components, so either of the following is valid: "2d10h30m" or "2d 10h 30m" -->
<source type="DAM" name="netx dam" objectType="asset" modDatePeriod="2d">
<field name="field.teamId" column="Team ID"/>
<field name="field.teamName" column="Team Name"/>
<field name="field.teamCity" column="Team City"/>
</source>
XML Destination
For syncing data to an XML file, a path and filename must be given in the records block. Also, the same two basic XML formats that are supported in syncing from source XML are supported for syncing TO XML. Here are a couple of examples:
<destination type="xml">
<!-- NOTE: Each records block would result in its own XML file -->
<!-- NOTE: Use $TIMESTAMP in the filename to include date/time in YYYYMMDDHHSS format -->
<records name="Hockey teams" container_type="attributes" filepath="/opt/netx/xml/teams_$TIMESTAMP.xml">
<!-- NOTE: See metadata sync docs in confluence for discussion of container_type (attributes vs. elements) -->
<!-- NOTE: There is no link element, we're just dumping records -->
<!-- NOTE: In this context attribute is the name of the data field in the XML (either the XML element or the XML attribute, depending on what is specified in the container_type setting -->
<map field="teamId" attribute="team_id"/>
<map field="teamName" attribute="team_name"/>
<map field="teamCity" attribute="team_city"/>
</records>
</destination>
<destination type="xml">
<records name="Hockey teams" container_type="elements" filepath="/opt/netx/xml/teams2.xml">
<map field="teamId" attribute="team_id"/>
<map field="teamName" attribute="team_name"/>
<map field="teamCity" attribute="team_city"/>
</records>
</destination>
JDBC Database Destination
For syncing to an external database, the JDBC connection string, username, password, and query must be provided in the records block. Attribute names are used symbolically in the SQL query. Following are two examples:
<destination type="jdbc">
<!-- NOTE: Each records block would result in its own external JDBC call -->
<records name="Hockey teams" jdbc="jdbc:mysql://127.0.0.1/external_database" query="UPDATE teams SET name = :team_name, city = :team_city WHERE id = :team_id" username="sa" password="secret">
<!-- NOTE: There is no link element, we're just dumping records -->
<!-- NOTE: In this context attribute is the name of the data field in the SQL -->
<map field="field.teamId" attribute="team_id"/>
<map field="field.teamName" attribute="team_name"/>
<map field="field.teamCity" attribute="team_city"/>
</records>
</destination>
<destination type="jdbc">
<records name="Hockey teams" jdbc="jdbc:mysql://127.0.0.1/external_database" query="INSERT INTO teams (id, name, city) VALUES (:team_id, :team_name, :team_city)" username="sa" password="secret">
<map field="field.teamId" attribute="team_id"/>
<map field="field.teamName" attribute="team_name"/>
<map field="field.teamCity" attribute="team_city"/>
</records>
</destination>
Invoking The Sync System
This system is designed to be invoked from an AutoTask. The job to specify in the AutoTask is com.netxposure.products.imageportal.module.metadata.sync.job.MetadataSyncJob. Optionally a list of specific task names can be specified for specific contexts in which the job should be run. Here is an example of a set of AutoTasks:
<?xml version="1.0" encoding="UTF-8"?>
<autotask>
<task id="Import Asset Sync Metadata" name="Import Asset Sync Metadata">
<matchCriteria type="or">
<criteria type="action" value="import"/>
<criteria type="action" value="resync"/>
</matchCriteria>
<customJob syncTaskNames="getTeamId,setTeamDefaults" className="com.netxposure.products.imageportal.module.metadata.sync.job.MetadataSyncJob"/>
</task>
<task id="Morning metadata sync" name="Morning metadata sync">
<matchCriteria type="and">
<criteria hour="11" interval="daily" minute="00" type="periodic"/>
</matchCriteria>
<customJob className="com.netxposure.products.imageportal.module.metadata.sync.job.MetadataSyncJob"/>
</task>
<task id="Afternoon metadata sync" name="Afternoon metadata sync">
<matchCriteria type="and">
<criteria hour="20" interval="daily" minute="00" type="periodic"/>
</matchCriteria>
<customJob className="com.netxposure.products.imageportal.module.metadata.sync.job.MetadataSyncJob"/>
</task>
</autotask>
In this example, the first task, “Import Asset Sync Metadata”, is run when an asset is imported or resynced. A list of sync tasks is included, getTeamId, and setTeamDefaults. The getTeamId task would be a value lookup sync task that would set the Team ID attribute so that the new asset can be associated with data in future sync operations. The second task is the setTeamDefaults task, which uses the newly set Team ID to retrieve initial values for the assets metadata.
The next two tasks run a general metadata sync at specific times every day. In this example, it is done at 11am and 8pm. No specific sync tasks are specified with the job, so all of the default tasks are run (all tasks where the include_in_default
attribute is not set to false
.)
Notes
It is possible that a database data source may be used that is not using the same database platform that NetX itself uses (for example, NetX might be using SQL Server, but a sync task needs to be defined that gets data from a MySQL database.) The JDBC driver for whatever type of connection is being made from the sync system must be present in the ROOT/WEB-INF/lib directory of the NetX installation. This might mean that there are multiple JDBC driver jars in a single NetX installation, if the database platforms are different.
When I get time, I'll create an XML schema definition or doctype definition or something for the config file, which would make validation a lot easier, since its not a simple configuration.
It is possible to set a version flag on the root element, i.e.:
<metadatasync version="1.0">
It is a really good idea to do this, because the format of this file will undoubtedly change (like I'll think up better names than “scalar” and “multivalue” for those sync task types.) The file reader is written so that it will be easy to have versioned file readers.
Configuration Reference
metadatasync
This is the root element for the configuration file, and contains one or more sync element.
Attribute | Description | Required | Default |
version |
Config file version | No | 1.0 |
sync
This is the definition of a sync task. It must contain exactly one source element, one destination element, and zero or more value_map elements.
Attribute | Description | Required | Default | Values |
type |
Sync task type | Yes | scalar,multivalue,valuelookup | |
log |
Logging level | No | summary | summary,detail,debug |
name |
Name of tast | No | ||
include_in_default |
Indicates if this task should be included in the default sync call | true | true / false |
source
This is the data source definition. The list of options depends on what kind of data source it is. It will contain one or more field elements, defining the data fields taken from the source, and can optionally contain one or more param elements and one or more callback elements.
Database
Attribute | Description | Required | Values |
name |
Name of data source | No | |
type |
Type of data source | Yes | database,xml |
jdbc |
JDBC connection string | Yes | |
query |
SQL query to get data | Yes | |
username |
Name of user to connect with | Yes | |
password |
Password to connect with | Yes |
XML
Attribute | Description | Required | Values |
name |
Name of data source | No | |
type |
Type of data source | Yes | database,xml |
filepath |
Filesystem path to XML file | No | |
category_id |
ID of DAM category with XML file(s) | No | |
row_xpath |
Xpath to get data rows | Yes | |
container_type |
Format of XML file | Yes | attributes,elements |
delete_data_file |
Indicates whether or not to delete file (this only applies if the XML file is in the filesystem, NOT when the file is an asset.) | No | |
file_to_process |
Indicates which file to use in a category | No |
param
This element defines context parameters for the query in a database data source.
Attribute | Description | Required | Values |
type |
Type of parameter | Yes | See discussion |
position |
Replacement string in query | Yes | |
attribute |
Name of attribute for attribute type | No | |
datatype |
How to format data in query | No | |
skip_null |
Skip null values in response data | No | |
skip_empty_string |
Skip empty string values in response data | No |
field
This element defines the fields being retrieved from the data source.
Attribute | Description | Required |
name |
Name of the field | Yes |
column |
Name of the column in the source | Yes |
callback
In a database data source, defines callback queries to be run on success or failure.
Attribute | Description | Required | Default | Values |
type |
Type of callback | Yes | success | success, failure |
sql |
Query to run | Yes | ||
param1 |
First parameter (replaces %1 in query) | No | ||
param2 |
Second parameter (replaces %2 in query) | No | ||
param3 |
Third parameter (replaces %3 in query) | No | ||
param4 |
Fourth parameter (replaces %4 in query) | No | ||
param5 |
Fifth parameter (replaces %5 in query) | No | ||
frequency |
When callback should run | No |
In a records block inside a DAM destination, a callback indicates a job to run in the DAM after a set of assets have been updated.
placeholder
Configures placeholder creation. This is only valid for scalar sync tasks, and for asset records.
Attribute | Description | Required | Default | Values |
create |
Set to true to create placeholders | Yes | false | true / false |
placeholderCategoryId |
ID of category where placeholders are to be created | Yes | ||
nameField |
Name of source data field to use to name asset | Yes |
value_map
Defines a value map. Contains one or more map_value elements.
Attribute | Description | Required |
name |
Name of the value map | Yes |
map_value
Contains the value mappings inside a value_map.
Attribute | Description | Required |
from |
Value to change | Yes |
to |
Value to replace | Yes |
destination
Definition of data destination. This will contain one or more records elements.
Attribute | Description | Required | Default | Values |
type |
Destination type | Yes | DAM | DAM |
records
Records block definition. This defines a set of assets to update with the incoming data. This will contain one or more link elements, one or more map elements, and optionally, a single filter element.
Attribute | Description | Required | Default | Values |
name |
Name of records block | No | ||
type |
Type of records to be updated | Yes | asset | asset, category, view |
assetMatcher |
Allows override of default asset matching to use the database to match assets to data rows instead of Solr. This can be useful if a sync task is dependent on another sync task to update an attribute, causing a race condition where Solr may not be updated before the second task runs. Currently, this is only relevant if the type attribute is set to asset. | No | solr | solr, database |
filter
Defines a destination asset filter to reduce the number of assets updated by the current data row.
Attribute | Description | Required | Values |
type |
Type of filter | Yes | solr, context |
query |
Solr query in a solr filter | Yes |
context
Defines context fields in a context filter.
Attribute | Description | Required | Values |
type |
Type of context | Yes | ASSET_ID |
link
Defines the attribute to source column links that match data rows to assets.
Attribute | Description | Required |
field |
Name of field in link | Yes |
attribute |
Name of attribute in link | Yes |
map
Maps incoming data columns to attributes.
Attribute | Description | Required | Default | Values |
field |
Name of field in map | Yes | ||
attribute |
Name of attribute in map | Yes | ||
datatype |
Applies formatting to value before update | No | currency | |
value_map |
Name of value map to use on field | No | ||
default |
Default value to set attribute if the source data is null or an empty string | No | ||
match |
Regular expression; not implemented | No | ||
format |
Format of data; not implemented | No | ||
replacement |
Not implemented | No | ||
skip_null |
Skip null values in response data | No | ||
skip_empty_string |
Skip empty string values in response data | No | ||
overwriteExistingValue |
Overwrite existing attribute value | No | true | true / false |
setvalue
Allows attributes to be initialized to a specific value. This is only allowed in the context of scalar or valuelookup sync task types.
Attribute | Description | Required | Default | Views |
attribute |
Name of attribute to initialize | Yes | ||
value |
Value to initialize attribute to | Yes | ||
valueLookup |
Controls whether values are set in a valuelookup task depending on whether the value lookup succeeded or failed, or either. | No | both | success, fail, both |
overwriteExistingValue |
Overwrite existing attribute value | No | true | true / false |