Coding in Megaladata: Python vs. JavaScript
Megaladata is a low code advanced analytics platform — it is easy to use and, in most cases, does not require programming languages to implement complex analytical logic. We consider this one of the platform's main advantages. However, not all problems can be solved without coding. Writing a few lines is sometimes faster and more convenient than a visual workflow design. This is why Megaladata supports two languages — JavaScript и Python.
The reasons why coding can be helpful in a Megaladata analyst's work are the following:
- More opportunities: Using programming languages makes it possible to implement functionality that differs from the standard Megaladata components.
- Adaptivity: Coding allows users to create more flexible workflows tailored to the requirements of a particular business process.
- Performance optimization: In some cases, programming helps to optimize the performance and efficiency of a high-load workflow.
- Reusability of solutions: Megaladata supports using previously created Python and JS scripts in new workflows.
Let's look at how coding integrates into Megaladata and discuss when to use each of the two languages.
Integration in the Processing Flow
When applying programming in Megaladata, the first thing to understand is the languages' integration principles. On the platform, JavaScript and Python operate as processing nodes. The JavaScript and Python components enable easy implementation of scripts written in these languages into the workflow.
We achieve simplicity of integration through a data API — an interface for receiving data from the input ports and transmitting it to the output ports. The data API ensures that the information in the nodes' ports is transformed transparently into the coding language structures that are fit for further manipulation through coding.
The user can configure the output port table on the page Configure output table columns of the JavaScript/Python node settings wizard. Another option is a dynamic configuration during the script execution; the user can choose it by marking the checkbox Allow creating output columns in script.
Here is an example in JavaScript: Receiving data from an input port through the data API, with dynamic generation of the output table structure:
import { InputTable,OutputTable } from "builtIn/Data";
// Initializing the number of columns
const ColCount = InputTable.ColumnCount;
// Initializing the number of records
const RowCount = InputTable.RowCount;
// Initializing columns
OutputTable.AssignColumns(InputTable.Columns);
// Forming a dataset from the input set in a loop
for (let i = 0; i < RowCount; i++) {
// Adding a new record to the dataset
OutputTable.Append();
for (let j = 0; j < ColCount; j++) {
// Sending the received input value to the output set
OutputTable.Set(j, InputTable.Get(i, j));
}
}
A similar example in Python:
from builtin_data import InputTable, OutputTable
from builtin_pandas_utils import to_data_frame, fill_table
# Creating pandas.DataFrame with the input set
data = to_data_frame(InputTable)
# Copying the input set columns
OutputTable.AssignColumns(InputTable.Columns)
# Filling in the output set
fill_table(OutputTable, data, with_index=False)
JavaScript
We employ the ChakraCore engine to execute JavaScript code. The nodes share the engine pool but execute the code in isolation. Maximum pool size depends on the number of the system's logical cores.
The JavaScript node utilizes a data API to receive and transfer data at the input and output ports. The interface gets the data "lazily". Operation with the data stays the same regardless of whether several JS nodes run parallel.
The JS component also supports the following:
- Fetch API: An interface for working with HTTP requests and responses that enables direct interaction of the JavaScript node with web services and network resources. The most common use case for the Fetch API is working with REST requests.
- File Storage API: An interface to perform all file operations (reading/writing files, creating folders, etc.). The advantage of this API implemented into the JS component is the safety of all these operations, both in desktop and server Megaladata editions. The desktop versions allow for addressing the whole file system (depending on user rights). Server editions allow access only to an isolated file storage space and shared folders (configured by the administrator). Thus, the File Storage API does not require a Megaladata user to have administrator rights or perform any unsafe actions.
It is possible to load JavaScript libraries; the platform supports two module specifications — EcmaScript 6 and CommonJS.
JavaScript is usable out-of-the-box and does not require any additional configuration, installations, or other actions from the user.
Apart from the JavaScript node, JavaScript can be used in the Calculator handler, which supports writing formulas both Excel-like and in JS.
Python
To work with Python, it is necessary to install a language interpreter to the server where Megaladata is deployed and the component is preconfigured. The use of the language in Megaladata may differ depending on the operation system.
Using Python on the platform can be risky since the language does not operate in a sandbox. With Python, it is possible to write malicious code, steal other users' data, or otherwise harm the system where Megaladata is deployed.
This is why running a Python code on the platform is disabled by default. To ensure safe operation, we recommend running the Python interpreter in containers. Thus, the system administrator will have to perform some pre-configuration.
The Python node under Windows can run as part of a Megaladata workflow or as a separate process. When working under Linux, only the individual process option is available. There are also some other differences and restrictions to take into account.
Using Python as part of a workflow does not require uploading/downloading data into files or involving other sources/destinations. The data API receives the data (variables and tables) "lazily" from the input port and uploads the processed data (a table) to the output port, reducing the RAM utilization. However, only one Python node can run on a Megaladata server at a time.
When the option Start in a separate process is selected, several Python nodes can run in parallel, but the analyzed data will be loaded into intermediate files. Separate processes of the Python interpreter will be started for each node, requiring additional time for disk read/write operations.
The Python node has an auxiliary module builtin_pandas_utils that can transform Megaladata datasets to pandas DataFrame and vice versa.
Choosing the Language
Megaladata's visual design tools are suitable for solving most of the tasks. Still, if the process requires coding, it is important to consider the specifics of how each language is integrated into the platform.
The JavaScript component in Megaladata has the following advantages:
- Runs safely in a sandbox.
- Does not require additional installation or authorization.
- Functions similarly under any OS and in any desktop or server platform edition.
- Ensures fast operation through JIT compilation and code caching.
- Always employs "lazy" techniques to address data.
Here are the strengths of using Python:
- Python is a powerful and universal programming language.
- There is a manifold of libraries — for working with data, machine learning, integration, and many more.
- There are a lot of developers who work in data analysis.
Python is very popular among data scientists. However, practice shows that coding is often used not for starting machine learning algorithms but for solving simple tasks — adding records based on specific field values, changing a table using loops, parsing JSON data, and so on.
Our general recommendation for using coding languages in Megaladata is this: Apply JavaScript whenever your process requires programming, and turn to Python only when JS does not solve your problem.