Megaladata: What's Under the Hood?

The article describes the technological basis of the low code Megaladata platform — its components, architecture, frontend and backend, functionality, and performance. We intend to answer the most frequent technical questions.

Megaladata is a low code platform that allows users to conduct advanced analytics with minimal coding. The platform offers an extensive set of research and analysis tools — from simple mathematical operations to neural networks. It enables an analyst to build a comprehensive data processing flow, starting with ETL and going as far as intelligent data analysis and machine learning.

Megaladata targets the processing of structured, or tabular, data. Due to the platform's high performance, it can be successfully used for big data processing.

Components of the platform

Megaladata operates as a server application for teamwork and as a desktop edition for personal analytics.

Server editions

The server editions of the platform consist of the following components:

Server: It is the essential part of the platform, the main component designed for deployment in an internal network or a cloud. It enables teamwork for several users at different workstations.
Studio: A thin client, a browser. The client web application serves for the creation of processing workflows, data visualization, server configuration, and user access control. The actual data processing is performed by the Megaladata Server.
Integrator: An optional platform component that enables publication of custom web services using REST (JSON) and SOAP (XML).

Desktop editions

The desktop editions include only one component — Desktop — with the same codebase as all the other editions of the platform. This approach lets us cover all the platform's use cases and saves us the development of two different versions of Megaladata for desktop and server applications.

Megaladata Desktop is a regular application that the user runs on their computer. It has a built-in local Megaladata server to perform computations and the Chromium Embedded Framework to display the user interface. The application does not require access to a local network or the Internet.

The edition runs on Windows 10 and above.

Deployment options

Megaladata sets no limitations for the deployment methods. The options we offer are:

Personal computer
Enterprise servers
Private cloud
Public cloud

The platform is cloud-ready. Having the provider's documentation, we will prepare an image for deployment in any cloud.

The server editions can run in containers. We have tested them with the following containerization software:

Docker (20.10.16 and above) and Docker Compose (1.27.1 and above)
Podman (4.2.1 and above) and Podman Compose (1.0.3 and above)

The Linux-based distribution of Megaladata includes scripts that allow users to create images on their own. Thus, it is possible to build containers that keep not only the Megaladata platform components but also auxiliary software — database access clients, a Python interpreter with libraries installed, Megaladata packages, data files, etc.

The use of Docker and Podman enforces security policies, minimizes the volume of consumed resources (memory, CPUs, etc.), increases fault tolerance, makes monitoring easier, and allows automation of deployment and scaling. Besides, the containers allow running the platform with outdated OS versions lacking the required libraries.

Implementation specifics

Solution stack

We use seven programming languages to develop Megaladata: Assembler, C, C++, C#, Pascal, TypeScript, JavaScript. Our choice is motivated by the goal of high performance and economical RAM use.

The libraries we employ are Acorn, ALGLIB, Ararat Synapse, AVL Tree TypeScript, Bauglir WebSocket, Brian Gladman's SHA, BrianGladman's AES, CEF4Delphi, ChakraCore, ClickHouse C++ client, CodeMirror, DateTimeField, Ext JS, FileSaver.js, Firebird, ICU, jsPDF, LIBLINEAR, librdkafka, libxml2, libxslt, Log4D, LZ4, MariaDB Connector/C, Mbed TLS, Metadata Reflection API, MKL, Newtonsoft.Json, NLog, NSwag, OpenSSL, PCRE2, Plotly.js, Promise Polyfill, SoapCore, SQLCipher, Swagger UI, SynLZO, TCMalloc, Tern, xxHash.

Besides this, Megaladata engages .NET Core, Apache HTTP Server, Chromium Embedded Framework, and the Source Sans Pro font family.

We have tested numerous libraries and selected the ones that ensure maximum productivity and resilient memory management. All the libraries employed in the platform are open-source, have no use restrictions, and require no license fees from the end user.

Architecture

Megaladata can function as a two-tier architecture platform where the server performs computing, and the thin client - a browser - is used for configuration, workflow design, and visualization.

The supported browsers are Chrome, Firefox, Opera, Safari, and Edge.

Megaladata Studio is the thin client, communicating with the Megaladata Server over the WebSocket protocol. The client-server connection can be established in one of two ways: directly to the Megaladata server or through a WebSocket proxy set up on a web server.

Figure 1: Communication to server without wsproxy (default)

Figure 2: Communication to server through wsproxy

The web server shown in the figures transfers static data, such as JavaScript files, style sheets, or icons. It can also serve as a proxy between the browser and the Megaladata Server if necessary.

When the Integrator component is present, it is possible to contact the Megaladata Server from the outside via APIs (SOAP+WSDL or REST JSON).

Communication between the Integrator and the Megaladata Server on top of Windows:

Figure 3: Integrator server Windows

When running over Linux the Megaladata services can be called by contacting the Integrator directly, but there is also an option to employ the Apache HTTP Server:

Figure 4: Integrator server Linux

In the Megaladata Team edition, the Server and the Integrator can be installed on one server only. They communicate using a Unix domain socket.

In the Standard and Enterprise editions, the Server and the Integrator components can run on the same machine or different ones. When running on the same server, they communicate through a Unix domain socket or TCP. When separate servers are used, only TCP communication applies.

The Megaladata platform has components for integration with SOAP and REST services. These components enable the most common ways to interact with web services.

Some external web services are nonstandard and use custom protocols and encryption methods inaccessible to the built-in Megaladata components. In that case, the user can achieve integration by writing a JavaScript or Python code.

Operating systems

The server components of the analytical platform run on OS Windows and Linux.

Windows:

Megaladata Server: Runs on top of Windows Server 2019 and higher. It doesn't require the installation of any additional products or frameworks.
Megaladata Integrator: Runs over Microsoft IIS 8.0 and higher. ASP.NET Core 6.0 is required.

Linux:

Megaladata Server: Runs under Linux (Kernel 5.3 and higher, standard library glibc 2.28 and higher).
Megaladata Integrator: Runs under Linux based on .Net Core.

When not running in containers, the Server and the Integrator start as a system service, and in case of operation within a container — as a PID 1 process.

Distributions tested: Debian 11.5, Fedora 36, OpenSUSE 15.4, Ubuntu 22.04.

Backend

The Megaladata server is designed as a monolithic system. Still, if a cluster of servers is used, it is possible to organize communication like the one between distributed loosely coupled components (services).

The Megaladata Server allows vertical scaling and ensures efficient resource use of one server:

Parallelization support
In-memory computing
Optimized data storage in RAM
Specialized memory management
Avoiding the intermediary abstraction layers by low-level core coding

When several Servers and Integrators are used, it is possible to build a server cluster to ensure fault tolerance and enable cold and hot swapping, horizontal scaling, and load balancing.

Megaladata Integrator employs simple task distribution and load balancing: The tasks are distributed between the Megaladata servers throughout the cycle, depending on the node utilization. If more complex orchestration is needed, additional software is required.

There is a possibility to build various cluster architectures — such as a single server cluster with all the Integrators connected to all the Servers.

Figure 5: Integral cluster

It can also be a cluster made of independent tiers, each including one Integrator and one Server.

Figure 6: Independent cluster

Any combination of such schemes is available.

Each component (service) in a cluster is self-contained and can handle its task independently.

Frontend

The Megaladata client is a single-page application. It uses a single HTML document as a shell for all the web pages and enables interaction with the user through dynamically loaded HTML, CSS, and JavaScript, usually with AJAX. A continuous Internet connection is essential, as all the actual processing happens on the server.

Each screen gets a unique URL that can be used to address the screen or share it with other users when necessary.

To save web traffic, we employ compressing and packet transfer via our own Remote Procedure Call (RPC), which is optimized for processing large data volumes and minimizing the client's contact with the server.

The data for visualization is calculated and stored on the server. The browser addresses a proxy layer and transmits the parameters that define the volume of data required by the visualizer. The specialized intermediary layer connecting the browser and the server employs AJAX, i.e., transmits only a part of the data to be visualized. This approach allows minimizing the volume of the data transferred for each display option.

For example, the cube visualizer sends the information about the cells displayed on the screen. The proxy layer accesses the server core to receive the piece of data required for displaying the cube fragment demonstrated on the screen. As a result, the visualizer, having received a relatively small piece of data, shows it quickly in the browser.

As all the interaction with the server requires just a browser, there is a technical possibility of operating the platform from a mobile device. However, the Megaladata interface is not optimized for small screens. Working from a tablet, though, is rather convenient.

We do not provide options for restyling the platform's interface to suit corporate branding.

Functionality

Workflow design

A logical unit of the platform is a package. It consists of zip-archived XML and binary files. A package includes abstractions of all the objects available in Megaladata along with references to other packages from which some components may be imported.

For faster loading, a package file has the same information both in XML and binary form. When the package version matches the application version, the platform reads binary data, as it is much faster. If the package was created in older Megaladata versions, the platform uses XML data, converting it to a new format when needed.

Megaladata employs the object-oriented modeling approach: Each workflow node is not just a handler but an inheritance-supporting class. A key component of the platform is a supernode that contains other nodes. There is no limitation on supernode nesting.

There are built-in components to implement basic analytic algorithms. The analyst can also create their own derived components, usually in the form of supernodes (even with no coding), and publish them so that other analysts could use them later.

Megaladata supports two strategies of workflow design:

Upwards: The development starts with data import, then the data is sequentially processed using various algorithms. This solution is optimal for rapid prototyping and implementing relatively simple data transformation logic.
Downwards: First, the designer determines the supernode's inputs and outputs, describing the logic of integrating the supernode into the workflow. Then, the processing algorithm is chosen based on the specified input and output requirements. We recommend this approach for creating reusable components.

Using code in workflows

The low code concept allows for employing programming languages to implement complex logic. Megaladata has built-in support for two popular languages — Python and JavaScript.

Python

The Python node interacts with the other nodes of a Megaladata workflow through data APIs that allow the receiving and transmitting of data at the input and output ports.

Using Python in a single-thread mode does not require downloading data from files or uploading it into files, nor engaging other destinations/sources. Data API allows the program to obtain data (variables and tables) "lazyly" from the input stream and load the processing results (a table) into the output stream.

When the option "Start in separate process" is selected, the analyzed data is uploaded to intermediary files. Then, an individual Python interpreter process starts for each node.

The Python node has an additional module builtin_pandas_utils that allows transforming Megaladata datasets into pandas DataFrames and vice versa.

Megaladata for Linux supports starting the Python interpreter in containers. It ensures operation safety, given that the containers are correctly configured.

JavaScript

To receive and transfer data through the input and output ports of a JavaScript node, we have implemented Data API identical to the one in the Python component. The Data API "lazily" transmits the data to the JavaScript node.

Unlike Python, there are no limitations on parallel processing when using JavaScript. The data handling methods stay the same, regardless of whether several JavaScript nodes run parallel.

Besides this, the JavaScript component supports:

Fetch API: An interface for working with HTTP requests and responses that allows the JavaScript node to access web services and network resources directly.
File Storage API: An interface that enables all file operations, such as reading/writing files, creating folders, etc. The advantage of this API in the JavaScript component is the safety of these operations, both in desktop and in server platform versions. Desktop editions allow for accessing the whole file system (depending on the user rights), while server editions offer access only to an isolated file storage space and to shared folders set by the administrator. Thus, File Storage API does not require a Megaladata user to have administrator rights or perform any unsafe actions.

It is possible to upload existing JavaScript libraries. The platform supports two module specifications, EcmaScript 6 and CommonJS.

Apart from the JavaScript node, the programming language can be used in the Calculator handler, which allows the user to write formulas both in Excel style and in JavaScript.

File storage

Desktop editions have access to all the objects of a workstation, taking into account the user rights, while server editions have file storage to enable operations with files and folders:

The file storage is physically located on the server where Megaladata runs.
Each user gets an isolated space in the storage.
It is possible to create shared folders; the administrator gives the access rights.

The user of a server Megaladata edition does not have access to the objects that are not in the file storage, except when using the handlers Program execution and Python, operating not in containers. As these handlers are unsafe, the rights to use them must be granted by the administrator. By default, these handlers are deactivated in server editions.

Reports

The Megaladata platform has a lot of built-in visualizers, including the cube OLAP module. The cube is a potent multidimensional analysis tool that displays data in the form of cross-tables and cross-diagrams. It supports several interactive manipulations with data — grouping, sorting, drill-down, detailing, computing of multidimensional parameters, and many more. In the cube, reviewing the data after each processing step is possible.

Megaladata also has many other visualizers for various purposes: tables, statistics, data quality assessment, cluster profiles, binary classification assessment, etc. The user can add any visualizer to the report panel to group the reports by folders.

There is also a special "Reports view" role. Upon opening a Megaladata project, the user with such a role does not see the workflows, models, connections, or other objects intended for configuring processing algorithms. Instead, such a user sees only the changeable variables and the report panel. When the user selects the necessary report, the platform performs all the computations in the background and displays a visualizer.

A user with the "Reports view" role cannot change the processing logic but can configure the representation of the calculated data. For example, they can choose the way the cube dimensions are shown or a chart type. Thus, the end user gets access to preconfigured analytical reports and has an opportunity to influence the way processing results are displayed.

If the platform's visualization tools are not enough, it is possible to integrate Megaladata with specialized BI systems — Tableau, Qlik, Power BI, and the like.

Megaladata provides no options for setting up periodic reports, like standard printed forms.

Logging and monitoring

Megaladata has a built-in dispatcher designed for viewing the open sessions and packages, monitoring and managing the activity in them, and cleaning the package pool.

The platform provides logging mechanisms that allow writing an information file with the required detail level: tracing, debugging, details, event, warning, error, or fault. The log files can include not only technical information but also the records generated during the workflow execution. Log files contain detailed information, including the user name, session number, time, node GUID, message text, exception class and text, etc.

The Megaladata editions for Windows use the Log4j format to write log files (Apache Logging Project). Files in this format can be parsed by many tools and libraries.

Besides writing Log4j files, the editions under Linux support the system journal daemon systemd — Journald. Megaladata under Linux has this logging type set by default, as it has several advantages compared to using files:

Centralized log management
Real-time log view
Filtering and search options
Converting to text and JSON
Sending logs to a third-party server
Various ready-made tools to analyze logs

Standard logs allow for building "server health" monitoring systems, applying additional tools such as Elasticsearch, Grafana, and the like.

Version control, DevOps, CI/CD

Megaladata does not provide built-in support for model repositories, DevOps, or CI/CD. To implement such functionality, we offer the Megaladata DevOps solution that provides a range of tools, preconfigured environments, scripts, and display arrangements to automate all the stages of the models' development and utilization.

Megaladata DevOps is based on the popular stack — Git, GitLab, Docker, NGINX, Gitbook, and NodeJS. Launching the Megaladata DevOps requires adjusting the scripts to the customer's infrastructure.

Integration

Integration with external systems

Megaladata does not have data input mechanisms. The user has to import the analyzed data from external sources. The processing results can be loaded to external systems, displayed as web service results, or visualized on the screen with special tools.

The ways to integrate the platform with other systems are:

Reading and writing files
Database import and export
HTTP requests (SOAP and JSON)
Communication through our own API, SDK, and libraries

For HTTP requests, we use OpenSSL to establish TLS connections and Synapse as an HTTP client. In Megaladata under Windows, we employ Secure Channel when the authentication requires a client certificate located in the Windows certificate store.

To export data to Tableau, we use a Tableau SDK that allows the creation of .hyper and .tde files. When the data is loaded, the user can publish the file by sending a REST request to the Tableau server (through Synapse + OpenSSL).

Supported databases

To use Megaladata, it is not necessary to have databases or data warehouses. For the analytical platform, they only serve as sources or destinations to import records or upload them after any processing step.

There are no strict requirements for developing a specific data warehouse. The analyst can use an existing warehouse or other sources; Megaladata integrates with the existing IT infrastructure.

The platform supports both relational and columnar databases. It accesses most databases directly due to fast drivers that support specialized techniques such as bulk loading.

Megaladata supports the following systems:

Data source type	Name/format
Files	Excel, Megaladata Data File, XML, CSV
Relational databases	Firebird, Interbase, MS Access, MS SQL, MySQL, Oracle, PostgreSQL, SQLite
Columnar databases	BigQuery, ClickHouse
ODBC	Teradata, Hive, HP Vertica, and others
Web services	SOAP (XML + WSDL), REST services (OpenAPI)
BI systems	Tableau (Only in Megaladata for Windows)
Message brokers	Kafka

Megaladata supports an embedded database system SQLite that doesn't require deployment and administration. SQLite is a library connected to the Megaladata server. It ensures high performance and low memory consumption, as a specialized API helps to minimize the number of read/write operations. SQLite supports encryption.

It is possible to work with NoSQL databases using a REST request mechanism that Megaladata supports.

Publishing web services

Publishing custom web services requires the optional Megaladata Integrator component. A user intending to publish a web service must have access rights.

The analyst can publish any workflow node as a web service. However, supernodes are the ones that get published most often. The process does not require any coding. In the platform, two types of services are created simultaneously: XML with a WSDL description and REST JSON with an OpenAPI description. The program generates the API documentation automatically.

When utilizing the GET or POST request methods, the analyst can send parameters to the input ports of the published node that fulfills the processing logic. GET serves for transferring only variables, while POST allows users to upload variables, tables, and trees.

Upon a call to the supernode, the server will perform only the operations the results of which need to be sent to the the supernode's output port, ignoring all other nodes. In case it is necessary to execute some other node not connected to the supernode's output port — for example, to export data to a database — the user has to set the execution order by drawing a link between the needed node and the output port.

This scheme is possible only for a single node, not the whole package.

Batch processing

The Megaladata server can execute the configured workflows not only in interactive mode but also in batch mode. To launch the latter, the user needs to give the server a workflow start command. It can be done in two ways:

Using the BatchLauncher utility;
Sending a web service request.

The BatchLauncher can start a whole package or a particular workflow node. It will run only those lines of the data stream that end with export nodes, as all other operations in the case of batch processing are useless.

The BatchLauncher utility can accept parameters. Upon finishing, it returns the status code. The execution errors are sent to the console and logged on the server.

If the user has the required rights, they can send a batch launch command both from the server and from any workstation that has access to the Megaladata server. In the second case, BatchLauncher will connect to the server over TCP to start the workflow.

To run batch processing, the user has to employ a job scheduler, e.g., Task Scheduler in Windows or cron in Linux.

Only the server Megaladata editions support batch processing. In the desktop editions, the operation is possible only in the interactive mode.

Thus, batch processing in Megaladata can be launched by schedule using a special utility or by event through a web service call.

Performance and scaling

Performance

Megaladata is one of the world's best-performing low code platforms for advanced analytics. We have achieved this through optimization at all levels:

Our own file format. Megaladata Data File is the fastest of the data sources supported by the platform. Its supports data compression/decompression during write/read, saves disc space through optimized string storage, and enables asynchronous data reading.
Quick access to databases. Most DBMS can be accessed directly by means of fast access libraries and batch read/write.
Shared memory. When operating on the same server, some databases (e.g., Firebird or MySQL) support shared memory, allowing Megaladata to access the necessary data directly.
Parallelization. Megaladata efficiently utilizes the multicore systems' resources, running all possible operations in parallel — workflows, read/write, machine learning, loops, etc. The user can set the execution order manually when needed.
MapReduce: The Loop component enables applying the MapReduce model to a multicore system: The dataset is divided into groups which are then processed parallelly. The processing results are then merged into a single output set. When the workflow is designed correctly, the processing speed grows almost linearly as the number of cores increases.
In-memory. Megaladata performs all the computations in the memory, attempts to keep the data in RAM, and stores only unique records by default. The speed of working with RAM is optimized due to allocating and freeing large memory blocks. The platform uses data structures that can be kept in cache.
Lazy evaluation. We employ a strategy that delays the evaluation until the result is needed. It helps to save resources and increase performance, as the computations are performed only when there is an actual need for them.
Cache management. If necessary, the analyst can flexibly manage the workflow data caching. — upon node activation or accessing the data; of the whole dataset or selected fields.
Rapid algorithms. We use the fastest math libraries written in low-level programming languages. The data is processed by windowing and stored in the memory in special data structures matching specific algorithms. The way the algorithms are implemented ensures efficient operation in multicore systems.
Formulas and code. When formulas are used, the platform creates and caches syntax trees of expressions and performs just-in-time compilation and caching of JavaScript code and regular expressions.

The asynchronous user interface does not contribute to a higher processing speed. Still, it creates the experience of comfort and high responsiveness even when the platform is performing prolonged operations — accessing external systems or complicated mathematics. The GUI does not get blocked, and the user can switch to another task instead of waiting for a long operation to be completed.

Throughput capacity

The platform's response time depends on multiple factors — hardware, processing logic complexity, data source speed, web service response time, etc. It is difficult to provide exact performance metrics without these details.

To demonstrate the platform's capacity, we will give some numbers shown in some projects utilizing the platform:

Average request processing time for 95% of loan applications in a credit pipeline (not including the response time of external services): 1-2 sec.
The throughput capacity of a decision support system for 95% of requests during peak flow — 5 000 requests for one server in an hour (not including the time waiting for external services to respond).
Request peaks for a decision support system — more than 10 000 requests/hour to a three-server cluster.

The hardware requirements depend on the amount of data to be processed, the complexity of required algorithms, and the number of users. Megaladata utilizes hardware resources efficiently, ensuring minimum system requirements. However, calculating the necessary server characteristics for each particular project is individual. We provide the minimum system requirements and recommendations in the software documentation.

Safety

Users

The server editions of the platform provide opportunities for access control, as the users can be added locally. The Enterprise edition has an option of LDAP authentication. The LDAP server may be Active Directory or OpenLDAP.

There are four available user roles:

Workflow design
Reports view
Batch processing
Administration

It is also possible to create shared folders and give users access rights. Such rights allow access to a whole folder, not an individual package or node.

Addressing external resources

The Megaladata server does not stealthily collect information in the background to send it to external resources. But the client addresses the website megaladata.com through REST API calls to get the following information:

Company news published on the Home page.
The number of the latest Megaladata version on the Home and About pages, to notify the user about any available updates.

The website megaladata.com is accessed by the browser or the desktop application, not the server.

The Internet connection is not crutial for the platform's correct operation. If there is none, the only difference is that the user will not get notifications about the company news and updates.

If your organization is interested in implementing Megaladata, please contact us.

tecnology

About Megaladata

Megaladata is a low code platform for advanced analytics

A solution for a wide range of business problems that require processing large volumes of data, implementing complex logic, and applying machine learning methods.

GET STARTED!

It's free