Sample

The sample function is a useful tool for selecting a smaller, representative portion of a larger dataset for analysis or validation purposes, such as when you're working with very large datasets, performing exploratory data analysis, or creating training and testing sets for machine learning models.

The percentage operator (0-100) in the sample function is a parameter that represents the proportion of the dataset you want to select as a random sample. It's expressed as a percentage, ranging from 0 to 100, where 0% represents no sampling (i.e., an empty dataset) and 100% represents the entire dataset.
Here's an example of using the sample function with a percentage operator to select a random subset of your data:
Dataset:
A | B |
---|---|
1 | 2 |
2 | 4 |
3 | 6 |
4 | 8 |
5 | 10 |
Suppose you want to select a random 40% sample of the dataset. After applying the sample function with the percentage operator set to 40, you might get:
Sampled dataset (40%):
A | B |
---|---|
4 | 8 |
2 | 4 |
Note that the sampled dataset is random, and the specific rows selected may vary each time you apply the function.
In Octai, you can use built-in functions or operations to apply the sample function with a percentage operator to your data. This allows you to work with more manageable subsets of your data while still maintaining its overall representativeness.
Updated 5 months ago