I see you used K-Means clustering. What’s the silhouette score of the clustering? What’s the distribution of silhouette scores across the samples? Why did you decide on 10 clusters? Did you try any other clustering methods? Why did you choose K-Means over other clustering methods?
> In the post-GenAI Era of 2025 onwards, with more and more of traditional, tech-savvy “Data Scientist” jobs handled by Machine Learning Engineers and AI Engineers, we, as people who stuck with the data world, need to become more like analysts and adopt a business mindset.
I call bullshit. As a data scientist working at a tech giant, this comes across as a lazy intern-level project with no statistical rigor and hunting “insights” while assuming significance. This is the *issue* with “business mindset” with no understanding of the theory and technical aspects of data science.
Anyone working in data science can tell you that GenAI is utterly useless when it comes to true data analysis. It *might* be able to give you simple statistics such as mean and variance but anything else is beyond what the current generation of GenAI is able to do.
You speak of “we, as people who stuck with the data world” yet you don’t seem to understand that the people who stuck with the data world need to be the ones who *understand* data science, not just throw some fancy-schmancy clustering together with no rigor so that product managers and clients who don’t understand K-Means clusters from chocolate nut clusters go “oooooo”.
How much did your client pay you and does he know you’re giving it away free on Reddit?
Are the features for the kmeans only the TF IDF of the post titles?
If you wanted to go a step further, you could let some ai transcript the audio content itself and TF IDF on that, would be very interesting
Isn’t it a bit weird you take the post with 2500+ likes and than conclude that the most mainstream safe bet option has lower variance? I mean you totally neglect the possibility of reaching 2500+ likes. While that seems like a relevant metric for potential succes
The fact that msub content seems driven by the creators definately confirms priors.
This post did NOT go well for OP damn
this analysis is so superficial. I wouldn’t say there’s slight evidence of the insights presented here
Comments are going crazy, but I think it’s a person learning data science and creating a project for a friend to build up their portifolio.
I feel like this kind of thing is interesting and important to know. But you kind of have to fit yourself into whatever niche your voice is good for. If you’ve got a high pitched, soft, feminine voice then you’re going to do way better with submissive male audios than the rough dominant ones, regardless of how popular either category is.
But that said, if I did have a softer voice, I probably wouldn’t try going for the submissive audios right away, but instead build a following with the gentle boyfriend stuff THEN break into the submissive audios when / if there was more demand for my work. Based on this data anyway. So knowing the trends is still helpful even if you are limited by whatever you can perform well.
I’m way too much of an idiot to understand anything from this
Fascinating data analysis.
However, the net result is the same as the net result of AI slop: simply reheating what everyone else is cooking.
Thing is, most people want creativity and originality, and while data is useful, it’s the very definition of nothing new.
If only y’all cared about methodology as much on other posts.
12 comments
I see you used K-Means clustering. What’s the silhouette score of the clustering? What’s the distribution of silhouette scores across the samples? Why did you decide on 10 clusters? Did you try any other clustering methods? Why did you choose K-Means over other clustering methods?
> In the post-GenAI Era of 2025 onwards, with more and more of traditional, tech-savvy “Data Scientist” jobs handled by Machine Learning Engineers and AI Engineers, we, as people who stuck with the data world, need to become more like analysts and adopt a business mindset.
I call bullshit. As a data scientist working at a tech giant, this comes across as a lazy intern-level project with no statistical rigor and hunting “insights” while assuming significance. This is the *issue* with “business mindset” with no understanding of the theory and technical aspects of data science.
Anyone working in data science can tell you that GenAI is utterly useless when it comes to true data analysis. It *might* be able to give you simple statistics such as mean and variance but anything else is beyond what the current generation of GenAI is able to do.
You speak of “we, as people who stuck with the data world” yet you don’t seem to understand that the people who stuck with the data world need to be the ones who *understand* data science, not just throw some fancy-schmancy clustering together with no rigor so that product managers and clients who don’t understand K-Means clusters from chocolate nut clusters go “oooooo”.
How much did your client pay you and does he know you’re giving it away free on Reddit?
Are the features for the kmeans only the TF IDF of the post titles?
If you wanted to go a step further, you could let some ai transcript the audio content itself and TF IDF on that, would be very interesting
Isn’t it a bit weird you take the post with 2500+ likes and than conclude that the most mainstream safe bet option has lower variance? I mean you totally neglect the possibility of reaching 2500+ likes. While that seems like a relevant metric for potential succes
The fact that msub content seems driven by the creators definately confirms priors.
This post did NOT go well for OP damn
this analysis is so superficial. I wouldn’t say there’s slight evidence of the insights presented here
Comments are going crazy, but I think it’s a person learning data science and creating a project for a friend to build up their portifolio.
I feel like this kind of thing is interesting and important to know. But you kind of have to fit yourself into whatever niche your voice is good for. If you’ve got a high pitched, soft, feminine voice then you’re going to do way better with submissive male audios than the rough dominant ones, regardless of how popular either category is.
But that said, if I did have a softer voice, I probably wouldn’t try going for the submissive audios right away, but instead build a following with the gentle boyfriend stuff THEN break into the submissive audios when / if there was more demand for my work. Based on this data anyway. So knowing the trends is still helpful even if you are limited by whatever you can perform well.
I’m way too much of an idiot to understand anything from this
Fascinating data analysis.
However, the net result is the same as the net result of AI slop: simply reheating what everyone else is cooking.
Thing is, most people want creativity and originality, and while data is useful, it’s the very definition of nothing new.
If only y’all cared about methodology as much on other posts.
Comments are closed.