The Intelligent Lux accelerates EDA

Author: Pranavi Duvva. Image by Colin Behrens, Pixabay Data Analysis. Automate visual data exploration using the Lux Python library. Did you find it tiring to write multiple lines of code for even a basic graph in EDA? Do you wish there were interactive, recommendation-based graphs in the jupyter notebook? That’s an affirmative answer! Thankfully! The Lux python library is now available. This article was inspired by Doris Jung Lin Lee’s WiCDS 2021. session. The Lux python API allows for intelligent visual discovery. It also includes an interactive jupyter widget. Lux is your intelligent assistant that can automatically automate visual elements of exploratory data analysis. With just one click, it provides powerful abstractions for visualizations as soon as the data frame is displayed in the Jupyter Notebook. Lux has a rich language that is user-intention-based. Lux Library’s main purpose is to make visualizations as easy as loading a dataframe. It allows users to browse the data quickly and identify important patterns and trends. The widget provides suggestions for further analysis. Lux can create visualizations of data sections you don’t know about. Source: Author Lux is very easy to use with pandas, and it doesn’t require you to modify the code. Lux is actually designed to preserve the semantics of the data frames in pandas. It synchronizes with the instructions of pandas, which means that it can also behave in accordance to them. This is Awesome! We’re ready to get going! Lux is our intelligent visual assistant. PyPI can install Lux. Pip install Lux-api Conda can install Lux by conda installation -c condaforge lux API 3. You will need the following extensions to configure the jupyter notebook. Now you can run jupyter’s nbextension installer –py Luxwidgetjupyter’s nbextension enabled –py Luxwidget. That’s all! We are now ready for the next step… Let’s look at a case dataset that will help us explore the Lux library features. The Graduate Admission dataset from Kaggle would be used. The dataset includes several parameters that are important for the Masters Program application. Data Dictionary GRE Scores ( out of 340 ) TOEFL Scores ( out of 120 ) University Rating ( out of 5 ) Statement of Purpose and Letter of Recommendation Strength ( out of 5 ) Undergraduate GPA ( out of 10 ) Research Experience ( either 0 or 1 ) Chance of Admit ( ranging from 0 to 1 ) 1. After the package is installed successfully, import all necessary libraries. Now we just need to import the Lux library into our Jupyter notebook. https://medium.com/media/a6193862250242f58a1f140fa4719a6e/href 2. Checking the summary and loading the dataset Let’s check the top five rows. https://medium.com/media/2e45ab5eb657c6f262ca0a359ac01a32/href Source: Image by Author Checking the shape of the data set. https://medium.com/media/68bc0f31084164dde7ea75606929dbb6/href (400, 9) There are a total of 400 rows and 9 columns. Delete the serial number from the first column. and checking the concise summary of the data set with the info https://medium.com/media/68914899a926a678906375089761aef1/href Source: Image by Author We observe that the data type of all the 8 columns in the dataset is numeric. 3. Visual Data Exploration using Lux Now let’s display the data frame, and then explore the Lux widget. Source: Image from Author Lux defaults to displaying the data frame. It then offers 3 tabs, which are Distribution, Correlation and Occurrence. Let’s learn more about each Correlation tab. Source: Image from Author. The correlation tab shows the relationship among the quantitative variables in the dataset. It displays the relationships between the quantitative variables in the dataset from the most closely related to the least. Source: Image by Author Distribution Source: Image from Author. The distribution tab shows the histograms for the quantitative variables contained in the dataset. It displays the most skewed variables first, then the less skewed. Source: Image by Author Occurrence Source Image: Author. The tab occurrence displays the bars charts for the categorical attributes. It follows the order of the least to most even distribution. Our dataset didn’t contain features of the categorical type. However, it recommended bar charts to highlight the features that might prove useful in our analysis. 4. Based on the user’s intent, visualizations and recommendations are made. Imagine you are looking for information about one feature, or several features. With the intent widget, you can view all visualizations for those attributes. You can also get additional suggestions for analysis using Filter or Enhance. Enhance. The Enhance function of Lux adds an attribute to the desired attributes that the user specifies for visualization. The user can compare the effects of each attribute with the desired visualization. It is similar to adding color. 2. Filter This filter allows the user to visualize specific attributes of different data subsets. The following example will help you understand the concept better. One attribute CGPA is df.intent df 1.Enhance Tab Recommendations when one attribute is CGPA. Source: Image by Author. The Enhance tab fixes the intended variable “CGPA”, and makes recommendations by comparing that with other attributes. 2. One attribute: Filter recommendations Filter tab recommendations. Source: Image from Author. The filter tab fixes the variable “CGPA”, and makes recommendations. It compares it to different parts of the data set. Two attributes, “TOEFL score” and “GRE score”, are being considered. Source: Image from Author. Enhance tab recommendations for two attributes. The intended variables are “TOEFL Score” and “GRE Score”, which it fixes on the x-axis. The program then makes recommendations by comparing different attributes. 2. Two attributes: Filter recommendations. Source: Image from Author. The Filter tab is used when two attributes have been entered. The filter tab fixes the desired variable, “TOEFL Score”, on the horizontal axis. It also fixes “GRE Score” at the y-axis. The program then makes recommendations, comparing the two variables with various subparts. Source: Photo by Author Export Visualizations Lux allows you to easily share your visualizations. The following command is required to export the visualizations as static HTML. df.save_as_html(“File name.html”) Conclusion Lux the new python open-source library is definitely making data exploration a lot easier. The article demonstrates how Lux can automate almost all of our visualizations using very little code. This article also explains some key features of Lux’s library. Project Lux’s Status: Lux is still in the early stages of development. Resource Information You can learn more about Lux Library at Lux-API. Also, you can try the Hands-on activities and tutorials for Binder. I hope you enjoyed this article. You can also check out my articles about pranaviduvva on medium. Thank you for reading! The Intelligent Lux: Speed up EDA was first published on Medium by on Medium. People are still reading it.

THE FOREFRONT OF TECHNOLOGY

We monitors and writes about new technologies in areas such as technology, innovation, digitization, space, Earth, IT and AI.

Related Posts

Leave a Reply