The Strange Death of Stack Overflow

Investigating the Flattening Volume of Activity

Joydeep Chatterjee
GoPenAI

--

From the site’s founding in 2008 thru Q1 2014, the number of monthly questions and answers created on Stack Overflow grew steadily. Since then, volume has been flat-to-decreasing. Why? Does this matter? Are there opportunities for the site to grow faster?

Introduction

From the founding of Stack Overflow in 2008 to the first quarter of 2014, the number of new questions and answers posted grew steadily before leveling out and ultimately declining. The purpose of this study is to explore the trend and attempt an explanation.


from google.cloud import bigquery

# Create a "Client" object
client = bigquery.Client()

# Construct a reference to the "stackoverflow" dataset
dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data")

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

The User Base

# Since there are not enough data points for 2022, that year has been omitted
users_new_query = """
SELECT EXTRACT(YEAR FROM creation_date) AS year, COUNT(1) AS num_users_new
FROM `bigquery-public-data.stackoverflow.users`
WHERE EXTRACT(YEAR FROM creation_date) < 2022
GROUP BY year
ORDER BY year ASC
"""

# Set up the query (cancel the query if it would use too much of
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
users_new_query_job = client.query(users_new_query, job_config=safe_config)

# API request - run the query, and return a pandas DataFrame
users_new_results = users_new_query_job.to_dataframe()

# Create column for cumulative number of users
users_new_results['tot_users'] = users_new_results['num_users_new'].cumsum()

From the founding of the website in 2008 to 2013, there is has been a quadratic increase in the number of new users. From 2013 to 2015, the growth has been stagnant, then accelerated from 2015 to 2017 before slumping for in 2018 and then regaining in 2019. From 2019 to 2021 was a sharp spike in the growth of new users.

Despite the erratic patterns of new user growth to Stack Overflow, there has overall been a steady exponential growth in the number of its total users. This still does not explain the stagnation of activity after 2014.

User Activity

# Since there are not enough data points for 2022, that year has been omitted
questions_query = """
SELECT EXTRACT(YEAR FROM creation_date) AS year, COUNT(1) AS questions
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE EXTRACT(YEAR FROM creation_date) < 2022
GROUP BY year
ORDER BY year ASC
"""

# Set up the query (cancel the query if it would use too much of
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
questions_query_job = client.query(questions_query, job_config=safe_config)

# API request - run the query, and return a pandas DataFrame
questions_results = questions_query_job.to_dataframe()

# Create column for cumulative number of users
questions_results['tot_questions'] = questions_results['questions'].cumsum()

Despite the overall steady growth of the number of users on Stack Overflow, the number of new questions posted steadily grew from 2008 to 2013 in lockstep with that growth. However, after 2013 there was a stagnant growth in the number of new questions asked, peaking between 2015 to 2016 and then slowly declining as an overall pattern.

Likewise, the numbers of answers posted by users followed the same pattern as that for those posting questions, but 2013 was a critical year in which the upward trajectory reversed to kick off a slow decline in site activity on that end as well.

Preliminary Conclusion

Despite the steady growth in the number of total users on Stack Overflow, 2013 was a critical year in which the actual user activity in terms of questions asked and answers provided began a sudden downturn. Clearly the user growth is not positively correlated with the number site activity by that point. To understand this key change in user behavior, it would be helpful to understand the user base more.

User Length of Membership

# Since there are not enough data points for 2022, that year has been omitted
users_act_query = """
SELECT EXTRACT(YEAR FROM last_access_date) - EXTRACT(YEAR FROM creation_date) AS years_active,
COUNT(1) AS num_users_active
FROM `bigquery-public-data.stackoverflow.users`
WHERE EXTRACT(YEAR FROM creation_date) < 2022
GROUP BY years_active
ORDER BY years_active ASC
"""

# Set up the query (cancel the query if it would use too much of
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
users_act_query_job = client.query(users_act_query, job_config=safe_config)

# API request - run the query, and return a pandas DataFrame
users_act_results = users_act_query_job.to_dataframe()

The overwhelming majority of users have been active for less than a year since signing up. The number of users that have lasted even one year is only a third of that, and those that have lasted two years is even a third of the previous number. Afterwards, the number of users that are active past a certain year tends to decline slowly until almost nonexistent, in comparison to the overall user base.

Again, 2013 is the critical year in which the average number of years that a user has been active on Stack Overflow suddenly rose, signaling that the website seems to have lost its appeal among casual users (defined as those who have not been active more than one year) and became the domain of more long-term dedicated users. There must have been a change in possibly the user policy or business model of Stack Exchange to have caused this alienation for casual users. It appears that most users that signed up for an account only used it temporarily before ceasing all further activity on it.

Final Conclusion

At this point in the study, it is logical to conclude that there must have been a particular event or controversy that slowed the growth of activity on the website because of having alienated casual users that could have become dedicated users. More users do not necessarily equal more activity since it is possible that majority of the users could have just created an account to ask a specific question and left once satisfied or may have found alternatives for technical assistance. Otherwise, the users that have been dedicated to this website did not mind whatever major change in 2013 occurred to have sharply slowed down activity.

Looking deeper, according to Stack Overflow’s Wikipedia article, in 2013 a change was made in the operation of the website to delete questions that have either been closed or have no answer to reduce unnecessary memory and bandwidth. In 2016, about 1.5 million posts were deleted permanently under those criteria. Therefore, although more questions could have been generated, the ones that were answered or deemed relevant remained to be answered, explaining the sharper drop in answers generated than that of questions past 2013.

Another possibly contributor of user decline is outlined in the Wikipedia page of Stack Exchange, the parent company of Stack Overflow. In 2013, a cloud security company called CipherCloud issued a Digital Millennium Copyright Act (DCMA) takedown of content regarding their proprietary software’s algorithm, which ultimately got restored in a censored form. This simple act of censorship or controlling what was once thought to be a community forum may have initiated the erosion of trust between the user base and the site management to cause a decline in site activity. Nevertheless, the number of users kept growing perpetually.

Stack Overflow once had a career section with jobs posted as well as a skills development feature to encourage users to challenge each other and learn, but those did not last. Therefore, given its regulations implemented in 2013 to reduce unanswered or irrelevant questions, the volume of actual user activity must have declined despite the growth of total users. If the website wishes to keep stimulating further activity, more interactive features, such as courses or workshops or even coding challenges may possibly allow greater generation of user questions and subsequent answers rather than random website visitors dropping in to skim technical answers and forever leave. Apparently, badges and upvotes and other non-tangible incentives that veteran users tend to take seriously are not helping increase engagement among the new and casual ones. In the age of MOOCs and development bootcamps, this may be a way to make the website much more dynamic rather than a 90s-style internet forum.

The original source code for producing the visualizations in this article can be found here.

--

--

Writer for

Creative data scientist passionate about applying machine learning to physical processes, logistics, and business activities.