top of page

Streamlit on Snowflake

What is Streamlit?

Streamlit is a low-code web framework for the Python programming language. It allows you to build an interactive web application with minimal coding. No front-end experience is needed and apps are written in pure Python. Over the past couple of years, Streamlit has become the standard for Python-based data app development with 80% adoption in the Fortune 50 and with hundreds of thousands of developers.


Why Streamlit on snowflake?

Rapid Development: Streamlit allows you to create custom web apps for machine learning and data science using Python. By leveraging Streamlit within Snowflake, we can quickly build and deploy powerful data applications without the need to move data or application code to an external system.


Secure Data Processing: Streamlit in Snowflake enables you to process and use data directly within Snowflake’s data cloud. We can build applications that interact with Snowflake data securely, without compromising on data privacy or security.


No Data Movement: With Streamlit, you can build applications that directly access Snowflake data. This eliminates the need to move data between systems, reducing complexity and ensuring data consistency.


Integrated Access Control: Streamlit apps adhere to Snowflake’s access control framework. You can define permissions based on the owner’s rights, allowing fine-grained control over who can use and interact with the app.


Resource Efficiency: By combining Streamlit and Snowflake, you leverage the full power of Python, its ecosystem of libraries, and Snowflake’s data processing capabilities. This synergy enables efficient development and deployment of data-driven applications


Methods to use Streamlit with Snowflake

Snowflake provides two options to use streamlit.

  1. Run streamlit on your machine and access the data from the snowflake with authentication

  2. Run your streamlit application entirely on snowflake


We will be discussing second method in this blog


Prerequisite:


To build dashboard with Streamlit you must have


Active snowflake account, if you don’t have snowflake account you can create one free trial account.

Python 3 (I used version 3.10.2)

A snowflake database


Dashboard:

The dashboard will be all about the Impact analysis of Variables on Sales, it will include displaying the data, descriptions , correlations and to perform the regression.

This would be a simple dashboard which we are going to create.


Creating a Snowflake database

1. Sign up for a Snowflake account at https://signup.snowflake.com


2. Check your “Welcome to Snowflake!” email for the account URL in order to access your Snowflake account. Go ahead and click on the LOG IN TO SNOWFLAKE button or copy/paste the account URL https://<your-unique-account-id>.snowflakecomputing.com) into an internet browser.


3. Go ahead and enter your Snowflake credentials.



4. Once logged in, you’ll see the Worksheets panel by default.

5. Create a new database by following the following instructions:



6. After database is created, either create a new schema or you can use the already schemas. For our demo we are going to create a new schema



7. Go to the above created schema and then create a new table by adding the data from our local or any other form you prefer.



8. Upload a new file as shown below.



9. Give it a name and click next



10. It will show the the file schema entirely, continue if you are happy with the schema properties else you can manipulate them too.



11. It will create your database schema.




Creating the Streamlit App and connecting to the database


There is a streamlit section into the projects column of snowflake. We will be using that one for creating our streamlit app



  1. First step is to create a new streamlit app by clicking on the streamlit button from left panel.

  2. After clicking on the “+ Streamlit App”, add the app title, app location(on which dataset and which schema it is stored) and app warehouse too.

  3. After entering all the details, app title , location, warehouse, you need to click on the create button for creation of the app.


























4. After successfully creating the streamlit app, you will be shown with this screen.



5. The following is a sample of regression analysi using OLS model. Paste the below code to the coding area of the streamlit app


import streamlit as st
from snowflake.snowpark.context import get_active_session
import pandas as pd
import statsmodels.api as sm
import altair as alt
import numpy as np
# Load data table
@st.cache_data
def load_data(table_name):
    ## Read in data table
    st.write(f"Here's some example data from `{table_name}`:")
    sql = f"select * from {table_name}"
    data = session.sql(sql).collect()
    return data

# Function to display data, descriptions, correlations, perform regression, and plot scatter charts
def display_data(df):
    st.write("### Data Preview", df.head())
    # Perform correlation and regression analysis on numeric columns
    numeric_df = df.select_dtypes(include=[np.number])
    correlation = numeric_df.corr()
    # Let user select a numeric column to analyze as the target variable
    target_variable = st.selectbox('Select a numeric column to analyze as the target:', numeric_df.columns)
    st.write(f"### Impact Analysis of Variables on {target_variable}")
    first_column = True
    col1, col2 = st.columns(2)
    for predictor in numeric_df.columns:
        if predictor != target_variable:
            corr_value = correlation.at[target_variable, predictor]
            # Filter to display only strong correlations
            if abs(corr_value) > 0.5:
                # Fit regression model with `predictor` as the independent and `target_variable` as the dependent variable
                X = sm.add_constant(numeric_df[predictor])  # Predictor
                Y = numeric_df[target_variable]  # Response
                model = sm.OLS(Y, X).fit()
                # Check for significant regression results
                if model.pvalues[predictor] < 0.05:
                    # Create a scatter plot using Altair
                    chart = alt.Chart(df).mark_circle(size=60).encode(
                        x=alt.X(predictor, title=predictor),
                        y=alt.Y(target_variable, title=target_variable),
                        tooltip=[predictor, target_variable]
                    ).interactive().properties(
                        width=300,
                        height=300
                    )
                    target_column = first_column and col1 or col2
                    target_column.altair_chart(chart)
                    target_column.write(f"**Impact of {predictor} on {target_variable}:**")
                    target_column.write(f"Correlation impact: {'increases' if corr_value > 0 else 'decreases'} `{corr_value:.2f}`")
                    target_column.write(f"Every unit increase in `{predictor}` typically results in {model.params[predictor]:.4f} unit {'increase' if model.params[predictor] > 0 else 'decrease'} in `{target_variable}`.")
                    target_column.write(f"This relationship accounts for {model.rsquared:.2%} of the observed variations in `{target_variable}`, indicating a {'strong' if model.rsquared > 0.5 else 'moderate'} influence.")
                    target_column.write(f"The statistical significance of this effect is strong (P-value: {model.pvalues[predictor]:.4g}). This suggests that the changes are likely not due to random fluctuations.")
                    first_column = not first_column

if __name__ =="__main__":
    # Title of the application
    st.title('Snowflake Dataset Analysis')
    #CONNECT TO STREAMLIT
    # Create a session
    session = get_active_session()
    st.success("Connected to Snowflake!")
    # Select and display data table
    table_name = "STREAMLIT_APPS.PUBLIC.ADV_SALES2"
    ## Display data table
    with st.expander("See Table"):
        df_sales = load_data(table_name)
        st.dataframe(df_sales)
    df=pd.read_csv(df_sales,header=None,engine= 'python', encoding = 'unicode_escape')
    custom_columns = ['sales','tv','radio','newspaper']
    df.columns=custom_columns
    display_data(df)


6. In case you see an error of missing packages, use the following Button to import the packages onto the snowflake-streamlit environment.


7. Find out all the required packages and import them

8. After importing the packages, we can now click on run button for out application to run. It will take around 12-15 secs to build the application.






Congratulations you have successfully created a streamlit application running on snowflake. Feel free to ask any questions.





60 views0 comments

Recent Posts

See All

Harnessing the Power of TimescaleDB

TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL and packaged

Comments


bottom of page