Guide

Using ChatGPT AI for Data Analytics

Oct 11, 2024Musthaq Ahamad
Using ChatGPT AI for Data Analytics

It's been couple of years since OpenAI has released ChatGPT, a Large Language Model / AI that have conversational capabilities. Since then there has been a huge shift from how softwares are being built, procecess have being simplified, and tool are being used. From hundereds of industries that got touched by ChatGPT and other AI models, Data Analytics and Analysis is one among them.

Engineers, and Analysts have been using ChatGPT for a variety of usecases. From generating SQL queries to query large datasets, and doing analysis and generating insights. In this article we will discuss ways you can use ChatGPT to quickly analyse your datasets, even if you are non-technical and talk about limitations of ChatGPT when it comes to Data Analytics.

Benefits of using ChatGPT for Data Analytics

ChatGPT doing data analytics and graphs

Leveraging ChatGPT in the field of data analytics offers several powerful benefits, making complex tasks more manageable and accessible. Below are some of the key ways ChatGPT enhances the data analytics workflow:

1. Natural Language Queries

ChatGPT allows users to ask complex data-related questions in plain English, without needing to know SQL, Python, or other programming languages. This makes it easy for non-technical professionals to interact with data and gain insights, breaking down the barriers between data and decision-making.

2. Quick Prototyping and Ideation

ChatGPT helps users brainstorm data analysis strategies and suggests code snippets for Python, SQL, or R. Whether it’s cleaning data, running visualizations, or building models, ChatGPT speeds up the ideation process, helping analysts quickly prototype solutions.

3. Data Explanation and Insight Generation

After running a model or creating visualizations, ChatGPT can interpret and explain the results in simple language. It helps translate technical outputs—like regression coefficients or statistical tests—into actionable insights that can be shared with stakeholders.

4. Error Debugging and Troubleshooting

When coding issues arise, ChatGPT can assist in identifying and fixing errors. Simply input an error message, and ChatGPT will suggest potential fixes or alternative approaches, helping users troubleshoot faster without needing deep coding expertise.

5. Automated Reporting

ChatGPT simplifies the process of report generation by summarizing key findings from datasets and analysis results. It can create concise, clear summaries for presentations or regular reports, saving time and effort while ensuring accuracy and clarity.

Practical Use Cases for ChatGPT in Data Analytics

ChatGPT offers valuable support for a variety of tasks in data analytics. Here are some practical use cases where it can be applied:

1. Exploratory Data Analysis (EDA)

ChatGPT can help users ask the right questions about their datasets, suggest approaches for handling missing values, and summarize key statistics like mean, median, or correlation. It simplifies the initial stages of data exploration, helping users get a clear understanding of their data quickly.

2. Code Writing and Query Generation

For users who need assistance writing code, ChatGPT can generate Python scripts for data wrangling, create SQL queries to extract data, or craft R code for statistical analysis. This feature is particularly helpful for users who are less familiar with coding or are looking to speed up their workflow.

3. Generating Insights from Data

ChatGPT excels at converting complex data analysis results into easy-to-understand language. It can explain statistical tests, model outcomes, and key trends in plain English, making it easier to communicate insights with stakeholders who may not have a technical background.

4. Data Visualization Assistance

When it comes to visualizing data, ChatGPT can recommend the most appropriate graph types based on the dataset or analysis goals. It can also provide guidance on using popular visualization libraries like matplotlib, ggplot, or plotly to create effective charts and graphs.

Example Interaction Scenarios

Here are a few examples of how ChatGPT can assist in common data analytics tasks:

Scenario 1: Data Wrangling

A user has a messy dataset with missing values and duplicate entries. They ask ChatGPT for steps to clean the data. ChatGPT might suggest imputing missing values with the mean or median, removing duplicates, and standardizing formats (like converting all date fields to the same format). It can even generate Python or R code for performing these tasks, making data cleaning faster and more efficient.

Scenario 2: Building a Predictive Model

A user wants to build a linear regression model in Python. They can ask ChatGPT to provide the code to do so. ChatGPT might generate a script using libraries like pandas, scikit-learn, or statsmodels, along with an explanation of each step. Once the model is built, ChatGPT can explain the significance of the coefficients, the model's accuracy, and suggest improvements if needed.

Scenario 3: Data Interpretation

A user inputs the results of an A/B test, including a p-value. ChatGPT can interpret the results by explaining what the p-value means in plain language, whether it’s statistically significant, and what that implies for the business decision. It can also suggest follow-up actions based on the interpretation.

How to Use ChatGPT for Data Analytics with Data Stored in a Database

An example of chatGPT doing data analysus

If your data is stored in a database, ChatGPT can be an invaluable tool for interacting with it, especially when you need to generate SQL queries to extract or manipulate the data. Here’s how you can leverage ChatGPT for effective data analytics:

Step 1: Get a Dump of Your Database Schema

Before you start asking questions, it’s essential to understand the structure of your data. You can request a database administrator (DBA) or use SQL commands to get a dump of your database schema. This schema contains important details such as the tables, columns, data types, and relationships in your database.

For example, you can use the following query to get the schema details in most SQL databases:

SELECT table_name, column_name, data_type
FROM information_schema.columns
WHERE table_schema = 'your_database_name';

This output will give you a list of all the tables and their respective columns, which is crucial when you start generating queries through ChatGPT.

Step 2: Ask ChatGPT to Generate SQL Queries

Once you have your database schema, you can begin asking ChatGPT to help you write SQL queries. You don’t need to be an SQL expert to start—just describe what you want in natural language, and ChatGPT will generate the appropriate SQL query.

Example 1: Retrieving Specific Data

Suppose you want to extract customer data, including their names and purchase totals. You can ask ChatGPT:

User: "Generate an SQL query to retrieve customer names and their total purchase amounts from the customers and orders tables. I want to join these tables based on customer ID."

ChatGPT:

SELECT c.customer_name, SUM(o.order_total) AS total_purchases
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;

ChatGPT can explain the logic behind the query and suggest modifications, such as adding filters or sorting the results.

Example 2: Filtering Data

If you want to filter results, such as retrieving only customers who have made purchases over $500, you can ask:

User: "Modify the query to show only customers who made purchases over $500."

ChatGPT:

SELECT c.customer_name, SUM(o.order_total) AS total_purchases
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name
HAVING SUM(o.order_total) > 500;

Step 3: Ask Analytical Questions

Once you’ve retrieved the relevant data, you can ask ChatGPT more complex analytical questions that require advanced SQL queries, such as calculating averages, performing aggregations, or identifying trends.

User: "How can I find the most popular product based on the number of orders?"

ChatGPT:

SELECT p.product_name, COUNT(o.order_id) AS number_of_orders
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.product_name
ORDER BY number_of_orders DESC
LIMIT 1;

In this case, ChatGPT generates an SQL query that joins the products and orders tables, counts the number of orders for each product, and returns the most popular one.

Step 4: Ask for Performance Optimizations

You can also ask ChatGPT to suggest performance improvements for your queries, especially if you’re dealing with large datasets or slow-running queries.

Example 4: Optimizing a Query

User: "Can you optimize this query for faster performance?" (Attach the original SQL query)

ChatGPT might suggest adding indexes, removing unnecessary subqueries, or restructuring the query to improve performance.

Step 5: Use ChatGPT to Understand Results

Once you’ve run the queries and obtained the results, ChatGPT can help you interpret them. For example, if you run an SQL query to calculate averages or other statistical measures, ChatGPT can explain what the results mean in the context of your business.

Example 5: Understanding Query Results

User: "I got an average order value of $150 from my last query. Is this good? What should I compare it with?"

ChatGPT might suggest comparing it to past performance, industry benchmarks, or customer acquisition costs, helping you derive insights from the raw data.

Limitations of ChatGPT in Data Analytics**

While ChatGPT is a powerful tool, there are key limitations when using it for data analytics:

1. No Direct Database Access

ChatGPT cannot access live databases or run queries directly. You’ll need to manually provide the schema or sample data, which can be time-consuming if you don’t already have a clear schema dump. Generating this dump requires extra effort and technical know-how, which can be a hurdle.

2. Errors in SQL or Code

While ChatGPT generates SQL queries and code, they may not always work perfectly when executed. You might encounter syntax errors or performance issues, especially for complex queries, and will need to manually fix or optimize the code after running it in the database.

3. Large Datasets

ChatGPT can’t handle large datasets directly. It can suggest query optimizations, but you’ll need separate tools like SQL or Python to process and analyze significant amounts of data.

4. Missing Business Context

ChatGPT lacks business-specific context and can miss important nuances like industry-specific terms, KPIs, or business rules. Without this context, the generated queries or insights may not fully align with your business needs, requiring human intervention to refine the analysis.

5. Limited Context and Domain Knowledge**

ChatGPT may struggle with deep, domain-specific analytics tasks and won’t always provide the best methods for specialized use cases. Human expertise is still essential to verify results and ensure accuracy.

Sequel: Your Smart Data Analyst for Seamless Analytics

An example of advanced data visualization that allows users to ask questions about their data in plain language

Sequel is a powerful platform designed to simplify and enhance your data analytics experience. Here’s how it helps:

  • Direct Database Connections: Sequel connects directly to your database, eliminating manual schema dumps and enabling real-time data querying and insights.

  • Self-Fixing Queries: One of Sequel's most powerful features is its ability to automatically troubleshoot and optimize SQL queries, fixing errors without requiring manual intervention.

  • Virtual Data Analyst: Sequel doesn’t just generate queries; it acts as a virtual data analyst by navigating datasets, understanding relationships, and providing key insights such as trends, anomalies, and performance metrics.

  • Configurable for Business Context: Sequel can be tailored to your business needs by aligning queries and insights with your specific KPIs, industry-specific metrics, and business goals. This ensures the analysis is both relevant and actionable.

  • Proactive Insights: Sequel helps you extract meaningful insights, automatically identifying key trends, opportunities, and inefficiencies that may otherwise go unnoticed.

With these features, Sequel turns complex data analytics into a seamless, user-friendly experience, empowering businesses to make data-driven decisions more efficiently.

Natural language to SQL using Sequel

Conclusion: ChatGPT vs. Sequel

While ChatGPT is a versatile tool for generating SQL queries, assisting with code, and interpreting data, it requires manual steps like schema access and error fixing, and lacks direct database connectivity. In contrast, Sequel connects directly to your database, offers self-fixing queries, and acts as a proactive virtual data analyst, providing tailored insights based on your business context.

For users who need quick, direct database analytics with minimal manual intervention, Sequel is the more efficient solution. ChatGPT, on the other hand, is great for general guidance and broader analytics tasks that require human oversight. Both tools serve different needs but can complement each other in a robust data analytics workflow.

Start exploring your data with Sequel

Save hours of time writing SQL queries. Get started for free.

Get Started For Free