Manipulate data

Data Manipulation with Octai

Data can be stored in many formats, tables, texts, images, sounds, videos, and so on. Most of the time, the information for our business needs is stored in tabular format, which is basically a two-dimensional table. Horizontal elements are called rows, and vertical elements are called columns.

Two-dimensional table

Two-dimensional table

Manipulating data in Octai generally involves a series of data preprocessing steps. Here's an outline of the common steps to manipulate data in ML Studio platforms:

Merging datasets is often a part of the data preprocessing step. To merge datasets in Octai, you generally follow these steps:

  1. Import datasets: First, import both datasets into the platform. This typically involves uploading the datasets in a supported format (e.g., CSV, Excel, or JSON) or connecting to a database.
Merge

Merge

  1. Identify common keys or columns: Determine the columns or keys that are common between the two datasets. This will serve as the basis for merging the datasets. Ensuring that the common keys have consistent values and data types is essential.
  2. Merge the datasets: This section often requires you to specify the datasets to be merged, the common keys or columns to be used for merging, and the type of merge (inner, outer, left, or right).
Merge

Merge

  1. Review the merged dataset: After merging the datasets, review the result to ensure that the merge was successful and the data is accurate. This may include checking for duplicate rows, missing values, or any inconsistencies.

Once the datasets are merged, you can proceed with other preprocessing tasks, such as cleaning, transformation, and feature engineering, before moving on to model selection, training, and evaluation.

Sorting and filtering columns in Octai is usually part of the data preprocessing step.

  1. Access data manipulation tools: on ML Studio, you will have access to visual data manipulation tools, data preprocessing modules or SQL-like query interfaces that allow you to sort and filter columns in your dataset.
  2. Column filter: To filter columns, you will need to use Octai's function to keep only the columns or rows that meet specific criteria. This can be done by defining conditions or applying a logical expression to the columns in question.
  3. Sort: To sort a column, you will need to use the platform's function to order the values in the column in ascending or descending order.
Filtering and Sorting

Filtering and Sorting

  1. Review the sorted and filtered dataset: After applying the filtering and sorting operations, review the resulting dataset to ensure that the data manipulation was successful and the data is accurate.

Once you've sorted and filtered the columns, you can proceed with other preprocessing task is feature engineering, before moving on to model selection, training, and evaluation.

To add features, you will need to use the feature flow function.

Feature Engineering

Feature Engineering

Or you can use directly Python Script for those processes above.

Python Script

Python Script

Adding a lag feature in Octai is a technique in time series analysis and forecasting. A lagged feature is created by shifting the values of a time series variable by a certain number of time steps, allowing the model to capture the temporal relationships between past observations and the target variable.

Here are some general steps to add lag features in Octai:

  1. Ensure proper sorting: Make sure your dataset is sorted in chronological order, typically by a timestamp or date column, to ensure the correct creation of lag features.

  2. Create lag features: In Octai, you can create lagged features.

Time Series Feature

Time Series Feature

Time Series Feature

Time Series Feature

  1. Handle missing values: Creating lag features may result in missing values at the beginning of the time series. You'll need to handle these missing values appropriately, either by filling them with a suitable value (e.g., mean, median, or a fixed value) or by removing the rows with missing values.