How to generate Synthetic Data on Qubrid AI Platform

How to generate synthetic data on Qubrid AI  

Step 1:  Navigate to Qubrid AI – Train | Tune | Deploy and login with your username and password or use google sign-in option. If you do not have an account please create an account,instructions here .

Step 2: Upon successful login, you will be navigated to home page. Click on AI Data Services to start the process to generate synthetic data. You can always return to the home page using the Home icon on the top of the left navigation bar.

Step 3: The synthetic data generation process consists of starting with sample data, this is the data you want to replicate, tuning a model with the sample data and generating output based on user requirements.

You can start by uploading a sample dataset for synthetic data generation. Make sure that your uploaded file satisfies the minimum file requirements as given below for best results. 

Minimum File Requirements: 

  • Format: csv 
  • File size: less than or equal to 5 MB 
  • Contents: numerical data only (text data is allowed for column heading, special characters are not supported (utf-8 only)) 
  • Minimum no. of rows: greater than or equal to 5000 
  • Table format: single table (multiple files are currently not supported) 

Step 3.1 Click on “Click to upload” hyperlink or drag-n-drop the file. When upload is complete you will see you file name on the screen. Click Next.

Step 4: Shortly after clicking next button, you will see the data preview page where you can preview the sample dataset and select a single or multiple numerical columns. These columns will be used to tune data models and the output file will contain additional rows of data for the selected columns only.

Step 4.1: To get started select columns using the search bar by clicking the Columns button (top left of the table). Use the search feature to find the desired column and select it by using the check-box next to it.

Once you have selected all the desired columns (currently limited to 100 columns) click Next.

Step 5: The time to complete the process with depend on the data sample and number rows that will be generated. Please hold tight until the entire process is complete (few minutes). When the process completes you will see a new table with the generated data, performance result scores and buttons to download the generated data and to download a report with all the metrics associated with the data generation process. For performance metric report click “Download Report “. Click “Download Data “ to download the generated synthetic data csv file .

Summary  

Qubrid AI’s innovative approach to synthetic data generation is revolutionizing the landscape of AI and LLM enterprises. Unlike other platforms , Qubid AI has integrated infrastructure to fine tune and reuse generated data for AI model training . By leveraging the power of CGANs and combining it with traditional GAN techniques, Qubrid AI offers a robust solution to the challenges of real-world data ensuring user data privacy. This ensures high accuracy and efficiency in model training to handle big production data . As the market for synthetic data is poised for significant growth, Qubrid AI stands at the forefront, delivering high-quality synthetic data that meets the diverse needs of modern AI applications.

Reference: 

[1] P. Kowalczyk, G. Welsch, and F. Thiesse, “Towards a Taxonomy for the Use of Synthetic Data in Advanced Analytics.” arXiv, 2022. doi: 10.48550/ARXIV.2212.02622. 

[2] “Maverick Research: Forget About Your Real Data – Synthetic Data Is the Future of AI,” Leinar Ramos, Jitendra Subramanyam, 24 June 2021.”, gartner.com, (Accessed: 9 Jan. 2024). 

[3] X. Gao et al., “Jointly Optimizing Diversity and Relevance in Neural Response Generation.” arXiv, 2019. doi: 10.48550/ARXIV.1902.11205. 

Shopping Cart
Scroll to Top