What's new in Megaladata 7.1

The First Major Update of the 7th Megaladata Version While working on the release, considerable attention was paid to the requests that came from our users. The main changes affected safety and usability improvements. Besides, new opportunities concerning database operation emerged, functionality of some standard components was expanded. Now the desktop Megaladata editions are also available for Linux.

Changes in Megaladata Server

The number of unsuccessful attempts of entering password on the authentication page became limited in the server Megaladata editions. In the Administration section on Parameters page new Password entry attempts limit and Password entry timeout (s) settings appeared. Check is performed for any attempt of the user login: how many failed attempts to enter password were performed for this user account in the last 5 seconds. If the number of attempts was greater than or equal to s, the password was not checked, and the following error was displayed: "Maximum number of account login attempts was exceeded".

On Login page it is optionally possible to remove Remember me (user) setting. For this purpose, it is required to set for rememberuser property false value in client.json file added to %ProgramFiles%\Megaladata\Client folder.

If the Megaladata user does not show activity during the set period of time, the session can be locked, or the server connection can be forcibly closed. New parameters used for the interactive sessions from web client appeared on Parameters page in the Administration section:

  • Session lock timeout (s) defines the user inactivity period upon expiration of which the session goes to the locked state. And connection with the server is not closed, no data is tranferred from the server to the client and vice versa. Some service messages are received, and it is allowed to call unlocking with authentication data transfer. The user can see only authentification window on the screen, and he can enter the password and restore the session or close the browser tab and the session along with it.
  • Session disconnection timeout (s) defines the user inactivity period upon expiration of which the server connection will be closed. And corresponding message will be recorded in the log.

Pressing keys, mouse buttons and scroll bar use are considered to be user activity.

The user can lock the session independently. In Menu Lock option appeared.

Periodic connection check from the server part is disabled by default. In the Administration section the value of Connection check period parameter was set to 1 by default (previously, it was 300 s).

It is possible to set restricted size of the RAM used by the Megaladata application in Megaladata Server for Linux. In the Administration section on Parameters page new parameters appeared:

  • Used memory maximum (KB) (by default — "not limited" (0)).
  • Memory usage ratio (by default — "not limited" (0)).

Minimum value of RAM restriction — 300 MB. When the memory usage limit is increased, the new value is applied at once, when the limit is decreased, it is required to restart the application. The restriction applies to the virtual memory. The automatic extension of the main thread stack is not affected.

The setting advantage is that OOM Killer doesn't work when the limit is reached. But if there is no enough memory in the system before the limit is reached, OOM Killer will be activated.

Megaladata Data File

For Megaladata files in addition to LZO compression algorithm LZ4 algorithm became available. As compared with LZO, the speed increase when recording the files with LZ4 is up to 10%, when reading — up to 40%. However, the files compressed with LZ4 algorithm are about 9-10% larger in size as compared with the ones compressed with LZ0 algorithm.

In Export to Megaladata file wizard LZ4 algorithm is added to the drop-down list of compression formats and it is used by default.

LZ4 standard enables to transfer the optional control sum of block and all its content. In Export to Megaladata file node wizard if LZ4 compression is selected, the user can set the value of the Checksum parameter:

  • to record the control sum of the compressed data after each block (by default);
  • not to record any control sum, and in this case the record will be performed 1% faster;
  • to record the control sum of the compressed data after each block and also the control sum of all uncompressed data at the end of the file, and in this case the record will be performed 1.5% slower.

To Import from Megaladata file Checksum parameter was added. It shows the control sumwith which the file was created with LZ4 compression, and new checkbox. When the checkbox is selected, the control sum is checked if it is available in the file. This insures the user against hidden distortions and "Access Violation" error if the file is somehow damaged. And reading will be performed 2% slower if the control sum is from the compressed data, and up to 7% if there are both kinds of the control sum. By default, the control sum is checked (the checkbox is selected).

Updated File Storage Manager

Many new functions appeared in the file storage manager. File management became more comfortable due to this fact.

  • It became possible to load folders with files/folders inside them using Drag-and-Drop. Previously, it was possible to load only files, and folder structure was to be manually set.
  • Now it is possible to create a new package directly from the file storage folder. Corresponding command appeared in the context menu of the manager.
  • When loading the files with the same names, it became possible to select one of the confict solution options: Replace/Replace all, Skip/Skip all or Cancel.
  • It is possible to load files and folders from the file storage in the archive form.
  • It became possible to unzip zip archive. Corresponding command was added to the context menu for this purpose.
  • The initial order of column sorting was changed from the "ascending" order to the "descending" order. The changes affected the following columns in the File Storage: Select file/folder, Size, Date of change and Read-only. In the File Dialog — Size and Changed.
  • Navigation through the list of files/folders using the keyboard became more comfortable. The incremental search/positioning on file/folder appeared when entering the first characters from the keyboard.

New Opportunities in Standard Components

New aggregation options appeared in Grouping and Table to variables components:

  1. Only that returns the value of the aggregated field if it is the only one at the group level. This aggregation option is used by default in Groupingand Table to variables node wizards for non-numerical fields (String, Date/Time, Logical, Variant) in measures.
  2. Mode that is available for all data kinds and types. When calculating the mode, null values are ignored. If there are several modes, the minimum value will be selected:
  • for the logical data type — False;
  • for the numeric data — the smallest number;
  • for the rows — the smallest one in the lexicographical order.

It became possible to configure "Not set" value in Imputation for "Replace with "Not set" imputation method. Now it is possible to specify the value with which the null value can be replaced. For example, "Replace with "Null data". The default value is "Not set" string dependent on localization. The default value remains dependent on the language until reconfiguration for the workflows prepared in the earlier versions.

ISO8601ToDate function was added to Calculator. It converts the string containing date/time in ISO 8601 format to the Date/Time value. The Date/Time is set to the current time zone for the specified time offset relative to UTC. If data offset is not set, Date/Time remains unchanged.

Now fields and variables in Calculator are displayed in different tables that can be switched. If the user double clicks on the variable in the expression editor, the tip with its value will be displayed.

It is allowed to refer to the connection nodes in the Reference node wizard.

In Neural network(regression), Neural network(classification), Logistic and Linear regression new By column sampling method was added in partitioning settings. Now it became possible to partition the set to the training and test ones in advance (Partitioning node) and supply the shared set to the node by selecting partitioning by column. Previously, it was required to divide the test and training sets after Partitioning node, and then it was required to supply the training set to the main node, and the test one to Node execution. If the shared data set was required (training + test) afterwards, Union node was to be used. Now there is no need in the additional nodes.

It became possible to execute Python in the isolated environment of the user. Now it became possible to transfer the environment variables to the Python interpreter process. As Python interpreter process it is possible to specify python_run.sh script (or the similar one) that will use the environment variables for implementation of Python start logics.

To execute Python in the isolated user environment, it is required to specify the values matching Pass variable node environment and Interpreter path parameters in the Administrationsection on Parameters page. In the case of python_run.sh Linux with installed docker or podman must be used. And also the base docker image with Python and required packages inside must be prepared.

Database Use Changes

Import from database wizard behavior changed. When the wizard is opened for the first time, the table list is not displayed by default even if database connection is activated. It is required to press on Activate button in the right upper node of the wizard. When opening the wizard afterwards and also the wizards of new Import from database nodes that use Connection with previously received list of tables, activation from the wizard is not required.

Execution mode with error ignoring appeared in Import from database. Editor of activation mode selection and Import timeout (ms) parameter were added to the wizard, and Import from database node received additional output port of variables — Execution status.

When importing in Ignore errors mode, the node is successfully activated even in case of errors in the activation process, and information on errors (completion code and error text) is recorded in Execution status output port.

It should be noted that when timeout is activated according to the execution stage the request execution or the output data set filling in stops. Thus, the output data set can be partially filled in for Ignore errors mode. Import timeout does not apply to preview inside the wizard.

The similar error ignoring mode also appeared in Export to database node.

When configuring connection to MS SQL, PostgreSQL, Oracle (without client), MySQL, ClickHouse and ODBC (if it is supported by driver), Connection timeout can be specified. If import from database or export to database is executed in the error ignoring mode, corresponding errror will be recorded to Execution status port in case of connection failure during the set period of time, and when importing/exporting in the normal mode, the node execution will be finished with an error.

Besides, when configuring connection toMS SQL, ClickHouse, PostgreSQL and MySQL, it is possible to specify Lock timeout (s) that sets timeout of resource (table, string) unlocking. Import/Export nodes can be frozen for unlimited time without this setting.

PRAGMA temp_store setting was added to SQLite connection. This parameter enables to define the storage place of temporary files, tables and indices (in memory or on disk). By default, they are saved on disk even for in-memory bases that can result in significant use of the disk and its corresponding wear when executing some workflows.

DateTime64, IPv4, IPv6,LowCardinality and Bool are now supported in ClickHouse.

Execution of scripts via Import from database is supported in PostgreSQL.

Ease of Use

Many changes aimed at ease of the platform use were introduced into the update:

  • Name of the browser tab changes according to the Megaladata Studio active element in the server editions. By default, the following form of the name is used: path to the active element · the product name. For example, if the user creates a draft, the Workflow page will be displayed on the screen, and the browser tab will have the following name: "Workflow · Package1/Module1".
  • It became possible to expand/collapse the Main (vertical) Toolbar. Corresponding button appeared in the lower part of the toolbar under "Processes" menu. The toolbar width can be changed in the expanded state by dragging its right border. This function is useful when configured visualizers with long names are displayed on the toolbar.
  • The buttons that enable to scroll the list of configured visualizers appear on the main (vertical) toolbar when there are many configured visualizers. "Configure visualizers" button was "scrolled" with the list of visualizers. It wasn't comfortable. The following changes were introduced: now "Configure visualizers" button is always visible on the main toolbar, and only the list of visualizers can be scrolled.
  • When moving the node from the component panel to the workflow construction area, it is displayed in the process of dragging, and attempts to connect the ports with the view of created links are made. If the ALT key is held down when dragging, the automatic connection will not be performed.
  • If the session was closed from Session Manager, the user received the following message: "Admin closed session via Session Manager". Previously, it looked like server connection loss for the user.
  • Not only the list of registered users but also their number are displayed on Users page in the Administration section.
  • When the node is added to the workflow, it passes to the selected state, and setup buttons are displayed on the node. Thus, the user makes less mouse clicks to go to the node configuration.
  • The value that will be used when parsing the file if the user doesn't change the settings was displayed in editors of properties that set separators (decimal separator, date separator, time separator, etc.) instead of the "default" value in Import from text file and Export to text file wizards. For example, for the decimal separator: "Not set (,)".
  • Preview of the calculation result appeared in Calculator (variables) node wizard. Names, captions and types of the used input variables, values of the input variables; names, captions and types of expressions and values of the calculated expressions are returned as the result.
  • When configuring Table to tree node, it became possible to fill in the tree node captions from the 1st xsd:documentation string when loading the structure from XSD schema. Generate node captions by xsd:documentation parameter was added in the window of the data tree generation by XSD schema. If True value is specified for this parameter, the generated tree nodes get the caption equal to the 1st non-empty string from xsd:documentation element for the source element or the attribute in XSD schema. If there are several xsd:documentation elements, only the first element is taken into account. If there is no documentation or the value of Generate node captions by xsd:documentation parameter is equal to False, the caption is generated by the element or attribute name as before.
  • It became possible to make multiline headers from the long ones in the Table visualizer. Corresponding setting appeared in the context menu and on the visualizer toolbar.
  • The list of problems for some fields after statistics calculation could be too large for the screen in the Data quality visualizer. And the horizontal toolbar didn't appear in this case. Now all information fits on the screen after statistics calculation, and the horizontal scrollbar appears if required. Besides, the data set characteristics table can be hidden to provide more space to display the list of problems.
  • Time of Table and Quick view visualizers preparation was optimized.
  • Detailing table in visualizers is updated faster and it is visually more comfortable for the user.
  • When selecting one record (node) in the table view of the Workflow, it became possible to view the information on the node in the Property inspector.
  • It is possible to go to any package object using the Address line, and the object page will be opened in the current tab. Now it became possible to open address in a separate tab by pressing on the object with the middle mouse button or left button holding down the CTRL key.
  • It became possible to copy message text from the popup error window. It can be done using the context menu of the window.

Logging

The information volume recorded in the log was significantly expanded.

Now not only successful but also unsuccessful user authentication attempts are recorded in the log. Apart from user name, his IP address and domain name are also displayed in the logged messages. The domain name is optionally displayed if "Allow domain names" checkbox is selected in the Administration section on Parameters page.

HEADERS parameter that contains the list of http headers of the current session was added to the message about unsuccessful user authentification and session initialization. Headers contain user-agent with information on the browser type and version.

When logging to file, the message identifier is displayed in each message (as well as when logging to journald):

  • the identifier is 128-bit number (GUID) represented in the form of the string in the hexadecimal format (32 characters in lower case).
  • the identifier in messages is displayed after the event time and criticality (type).

Record example:

2023-05-12T17:27:37.475 info e688494e9da1c6488e5f505b8e65a92e (Server.exe:7280>user:1:) - Session initialized for user user{"USER_NAME": "user", "SESSION_TYPE": "Interactive session", "CLIENT_IP": "127.0.0.1", "CLIENT_HOST": "PC-X.bg.local", "HEADERS": "{host=localhost:8080,connection=Upgrade,pragma=no-cache,cache-control=no-cache,"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",upgrade=websocket,origin=http://localhost,sec-websocket-version=13,"accept-encoding=gzip, deflate, br","accept-language=en-US,en;q=0.9",sec-websocket-key=rO8ZERq8O9UCJ/Np8j9plA==,"sec-websocket-extensions=permessage-deflate; client_max_window_bits"}"}

Changes in Megaladata Integrator

Megaladata Integrator tracks the changes in the list of published packages by event that comes from Megaladata Server. In some cases Integrator ceases to respond to change of published packages, for example, if there is no network connection between Server and Integrator, and then it was restored. To avoid problems with tracking of changes in published packages, option of regular request of the published packages list was added to Megaladata Integrator.

For this purpose, it is required to specify packageRefreshPeriod parameter in the Integrator configuration file that sets the period (in seconds) of request of the published packages list from the server. Integrator compares this list with its own one, and if there are changes, it applies them. packageRefreshPeriod parameter is not available by default, the list of packages is not periodically requested.

Megaladata section in Megaladata Integrator Integrator.dll.config configuration file was renamed to settings.

Desktop Editions for Linux

Now the desktop Megaladata editions are also available for Linux. Megaladata Desktop for Linux uses the cross-platform library of GTK3 interface elements, and it can be installed at the operating system distribution kits supporting GTK3 with minimum core version 5.3 (if it is not planned to use Python, the minimum core version is 4.11). Besides, in some cases xdg-open and dbus-send utilities, the value of the environment variable DESKTOP_SESSION are used.

Megaladata Desktop distribution kit is supplied in the archive form. To start the work, it is sufficicent just to unzip the archive and start the executed file. There is README.txt file in the archive. Megaladata Desktop setup and shortcut creation instructions and other useful information are provided in it. Self-installation may be required when using the third-party libraries.

The desktop edition provides one more option, namely, full-screen mode of operation. It can be used by pressing the F11 key.

Other Changes

The algorithm was improved for Association rules component. The data sets with the support that exceeds the maximum one do not get into the tree of popular data sets used for model training. Thus, less trees are saved in the node configuration, the processing speed is much higher if there are many elements in the source transactions.

When opening the dimension filter panel in the Cube visualizer in "Defer layout update" mode, data loading starts, and if the data is loading for a long time, "Cancel" button is displayed. It enables to stop the loading process. If the process is cancelled, "Load" button will be visisble on the dimension filter panel for the repeated loading start. Previously, it was possible to stop the long process of loading only via "Processes" panel.

See also

Release notes 7.2.3
Fixed: Memory leaks in Calculator, bugs related to connections, database exports, and multiple text file imports. Improved: Operation of Neural Net, Supernode, Loop, and other components.
Improving Employee Skills in Data Science
The world is awash in data, yet we struggle to fully capitalize on its potential due to a severe shortage of skilled professionals. A significant mismatch persists between the demand for data scientists...
Working with Databases in Megaladata
Databases are one of the most popular sources of information in analytical projects. Megaladata supports work with various DBMS. This article covers all stages of work with them: connection, import, and...

About Megaladata

Megaladata is a low code platform for advanced analytics

A solution for a wide range of business problems that require processing large volumes of data, implementing complex logic, and applying machine learning methods.
GET STARTED!
It's free