Analysis of Texas Public School Spending and STAAR Performance

Project Overview

A1. Research Question

The research questions this project sought to answer is whether we can predict if a Texas public school district would perform well or do poorly on the STAAR test based on how they allocate or spend their budget with consideration to how racially diverse a district is. This project also sought to analyze which financial expenditure items are important predictors of whether a district will perform well or do poorly on the STAAR test. The purpose of this analysis is to provide counsel to school administrators and policymakers so that they can make data-driven decisions instead of relying on anecdotal evidence alone.

A2. Project Scope

This project’s scope was to use a Jupyter notebook to build a model that predicts whether a Texas public school district would perform well or do poorly on the STAAR test. This model took several inputs of financial expenditure data and then outputted a prediction related to that district’s performance on the STAAR test. This project examined academic year 2018-2019.

A3. Solution Overview – Tools

The Jupyter notebook that contains the model was coded in the Python programming language and relied on several key inputs from the end-user. The first input is a csv file called Enrollment Report_Statewide_Districts_Grade_Ethnicity_2018-2019 (Texas Education Agency, 2019), which contains demographic information on white and non-white population of the district. The second key input is tidy_campstaar1_2012to2019 and tidy_campstaar2_2012to2019 (Texas Education Agency, 2022), which contains information on how many students in the district performed well (meets or masters a subject) or did poorly (approaches comprehension of a subject). The third key input is the 2007-2021-summaried-peims-financial-data (Texas Education Agency, 2022), which contains information on how a district spent their budget. The notebook then automatically performed cleansing and transformation of the data to make it suitable for analysis and modeling. The notebook then constructed a model that predicted how well a district would perform or do poorly on the STAAR test.

A4. Solution Overview – Methodologies

There were four different types of methodologies used in this project: Project, Data Collection, Analytical, and Statistical. Each methodology played an important role in the planning, execution, and verification of the project.

Project Plan

            In this project, the author executed the plans without change. All goals, objectives, and deliverables listed below were completed as describe in Task 2 apart from the dates or time frame for each task. The timeline projected in Task 2 were very conservative and the actual project completed earlier than planned.

B1. Project Execution

The goal for this project was to create a model that would predict how a district would perform on a STAAR test. The objective of this goals are:

  • Determine which districts are performing or underperforming in relation to how much funding they receive, how racially diverse a district is, and how well the students in the district perform on the STAAR test overall.
  • Determine which type of expenditures contribute to the weighted performance of the district.
  • Predict how a district will perform based on how they allocate their budget.

B2. Project Planning Methodology

The project planning methodology used was PAPEM-DM (De Dios, 2020). The seven steps involved in this method include: Planning, acquisition, preparation, exploration, modeling, delivery, and maintenance.

  1. Project Planning – answered the why of the project, addressed the how, delivered a list of research questions to answer and criteria of success.
  2. Acquisition of Data – acquired the data from the TEA and Kaggle websites.
  3. Preparation of Data – prepared the data which includes cleaning the data and transforming it into a format suitable for analysis and modeling.
  4. Exploratory Data Analysis – explored the data using descriptive statistics, visualize the relationship between variables using matplotlib and seaborn. The deliverables of this phase would be a few graphs ready for presentation to stakeholders.
  5. Modeling – created, fitted, and used the model for training and inference. The deliverables were a pickle file for the trained model.
  6. Delivery of Results – delivered the results by completing Task 3 of this capstone project.
  7. Maintenance – while important, this project did not include this phase. However, the Jupyter notebook can be used as a basis for deployment into production (for inferencing) using any of the cloud technology platforms like AWS, Azure, or GCP.

B3. Project Timeline and Milestones

PlanningProject planningProject plan8/18/37 days
AcquisitionGetting the dataRaw datasets8/48/41 day
PreparationCleansing and transformationClean datasets8/58/2113 days
ExplorationEDAVisualizations8/58/2113 days
ModelingTrain and inferPickle file8/58/2113 days
DeliveryTask 3 ReportPDF Document8/228/287 days
MaintenanceDeployment to productionNot ApplicableN/AN/AN/A


C. Data Selection and Collection Process

The author downloaded two of the datasets on the TEA website and while the third data could have also downloaded from the same place, the author chose to use the Kaggle version of the data because it has already been collected in a format that would not necessitate me using web scraping to mine the information from the website.

Data selection and collection didn’t vary much from the proposed plan. The only one addition the author made was the inclusion of ethnicity data that the author downloaded from the same TEA website. This data was necessary to examine and analyze the effect of racial diversity within the districts.

One obstacle the author encountered was in the same dataset that the author added during the actual execution of the project. The author encountered several errors when reading the csv file and realized that formatting is off on the actual csv itself. The author adjusted the read_csv parameter “skiprows” and was able to bypass the header information that was needed in the analysis.

There really were no data governance issues because the dataset is already in the public domain. The one concern would have been anonymity of the students in schools or school districts that are too small that the anonymity of the students would have been impossible to mask. This issue was already taken care by the Texas Education Agency before they even published the dataset.

For example, for those districts that have small subpopulations, the values for the numerator values have been coded as -999. The author simply had to replace those values with a 0 for the dataframe to be processed cleanly. Another example of the datasets’ cleanliness is the fact that there were no missing values in any of the columns.

Therefore, cleanliness and sanitation were the obvious advantages of the datasets used. However, their limitation was the fact that when merged together, the dataset only encompasses the 2018-2019 school year. This is by design, Hence, not really that big of a limitation to warrant much concern.

D. Data Extraction and Preparation Process

The author utilized the power of pandas to read the csv files into a dataframe. The method read_csv() was appropriate because the data is in a tabular format delimited by a comma.

For data preparation or data wrangling, the author used many techniques to handle missing and duplicate data, as well as remove extraneous columns and rows. Some of these techniques include replacing null placeholder values with true numeric NaN values, drop_duplicates() and drop() columns, as well as, string methods to clean up the or standardized the columns names. The author also used locs, ilocs, and np.where to filter columns and rows. Finally, for merging the datasets, the author a combination of concat() and merge() methods to execute inner and outer joins of the data. Utilizing pandas methods made sense because of their efficiency, versatility, and simplicity.

E. Data Analysis Process

For exploratory data analysis, the author used several visualizations to examine the relationship between variables. For example, the author used a heatmap to visualize the correlation coefficients via a correlation matrix using the Pearson method. Also, a pairplot was used to examine whether any of the relationships between variables had a positive or a negative trend. Both were essential in determining and validating whether any of the variables suffered from multicollinearity. Knowing this is important because it would determine what type of machine learning algorithm can or cannot be used if the author choose not to handle the related variables individually. Finally, several violin plots were used to examine the distribution and density of each variable belonging to both passing and failing classes.

To complement the visualizations, the author implemented a few statistical tests to analyze the dataset. These tests were the Levene’s test, the Shapiro test, and the Mann-Whitney U test. The author used Levene’s and Shapiro to test for variance and normality so that the author can validate whether to use parametric or non-parametric methods of statistical testing. As a result, the author used Mann-Whitney U to test for significance.

However, statistical testing does have its own limitations. For one, qualitative factors of the study are not considered because “statistical method cannot study phenomena that cannot be expressed, captured, or measured in quantitative terms.“ For example, we considered the efficiency of districts while regarding for racial diversity, but on must be careful in assigning weights to the imbalance because it is simply not a simple black and white matter. One cannot simply assign a numerical value on racial disparity and there is no clear-cut formula that defines racial inequity neither.

In the author’s opinion, both visualizations and statistical testing techniques were both needed at the same time. Visualization, although limited to eye-balling techniques can reveal hidden patterns in the data while statistical testing can validate assumptions made by eyeballing said data.


F. Project Success

F1. Statistical Significance

Calculating the statistical significance of this project was a reasonably straight-forward task. First, I had defined a formula to define the measure of efficiency with respect to racial diversity. I called the measure E for efficiency, and E = (B/(P/S))/X where B is the total operating program budget of the district, P is the total number of passing (meets or masters a subject) students, S is the total number of students taking the test, and X is for the ratio of non-white student to white student enrollment. Based on this formula, I then calculated the first quartile of created two subsets of data based on that quartile. Those equal or below the quartile contained district data that had performed well on the STAAR test and those above the quartile contained districts that did poorly on the STAAR test. I then tested for the statistical difference between the two group samples. Since the p-value is below 0.05, the difference between the two groups were found to be statistically significant. With this, the author can reject the null hypothesis and confirm the original hypothesis that the “Payroll” expenditure (and other variables for that matter) of districts that performed well on the STAAR test is statistically different than those districts that performed poorly.

F2. Practical Significance

The practical significance of the difference in certain expenditures will result in a significant amount of money saved by each district. For example, if a school district decides to spend more on operations versus paying more for capital investments, the district would have realized more of a return on this investment in terms of higher student performance scores on the STAAR test.

F3. Overall Success

The author believes this project a success. All three criteria laid out in task two were met. First, the formula for calculating the efficiency of a district is appropriate and applicable. Second, the model has an accuracy and AUC scores of more than 70%. And third, the project lists the most important features for high-performing districts.

G. Key Takeaways

This project set out to create Jupyter notebook that contains a model that would predict whether a Texas public school district would perform well or do poorly on the STAAR test. This model needed to be based on the weighted definition of efficiency and also need to have an accuracy or AUC score of more than 70% to be considered a success. The following chart summarizes whether the Jupyter notebook accomplished its first two metric.

Criterion/MetricRequired DataSuccess
Is the formula for calculating the efficiency appropriate and applicable?E = (B/(P/S))/XYES
Does the model have an accuracy and AUC scores of more than 70%?Accuracy 90.93% / AUC 96%YES
Does the project list the most important features for high-performing districts?Bar graphYES

The table above is a straightforward way to visualize and summarize the accomplishments of each objective.

The graph above shows the Receiver-Operator curve (ROC) which visualizes just how skilled the model is (blue line) compared to an unskilled model (by chance, red line).

The graph below shows the most important features of the model and is a graphical way to convey how each different variables affects the model based on the F score.

            Based on the findings above, the author suggests that a study be conducted on how particularly racial diversity affects the efficiency of Texas public school districts as it has been shown that it does influence the model. In addition, further study needs to be conducted to determine whether having a bilingual program is an indication of positive or negative correlation on performance.


Baron, E. J. (2021). School Spending and Student Outcomes: Evidence from Revenue Limit Elections in Wisconsin. American Economic Journal: Economic Policy, 14(1), 1-39. Retrieved from

Carhart, A. E. (2016). SCHOOL FINANCE DECISIONS AND ACADEMIC PERFORMANCE: AN ANALYSIS OF THE IMPACTS OF SCHOOL EXPENDITURES ON STUDENT PERFORMANCE. Master Thesis. Retrieved from chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/

Texas Education Agency. (2022, July 31). PEIMS Financial Data Downloads. Retrieved from Texas Education Agency:

Texas Education Agency. (2022, July 31). Texas Education Agency Data 2012-2019. Retrieved from Kaggle:

Thompson, J., Young, J. K., & and Shelton, K. (2021). “Evaluating Educational Efficiency in Texas Public Schools Utilizing Data Envelopment Analysis. School Leadership Review, 16(1), Article 9. Retrieved from

Published by

Ednalyn C. De Dios

I’ve always been enamored with code and I love data science because of its inherent power to solve real problems. Having grown up in the Philippines, served in the United States Navy, and worked in the nonprofit sector, I am driven to make the world a better place. I have started and participated in numerous campaigns that aim to reduce domestic violence and child abuse in the community.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.