Hello, My name is Tae Kim.
I am new to CBSi working at Irvine office as a BI Analyst.
As a new employee without knowing much about Hack Bowl @CBSi,
I looked at brands of CBSi to get some idea for this event.
I am new to CBSi working at Irvine office as a BI Analyst.
As a new employee without knowing much about Hack Bowl @CBSi,
I looked at brands of CBSi to get some idea for this event.
Metrolyrics is one of CBSi's brands offering lyrics of over 1 million songs.
Songs are sorted by genres.
I scraped lyrics of top 100 songs from each genres = 800 songs.
(Sorry, if web scraping is not allowed).
My goal is to analyze + compare lyrics by genres.
I scraped lyrics of top 100 songs from each genres = 800 songs.
(Sorry, if web scraping is not allowed).
My goal is to analyze + compare lyrics by genres.
Here's a diagram of work flow and architecture
(A friend told me this is a very ugly color scheme).
(A friend told me this is a very ugly color scheme).
I didn't expect to see Frozen still on the chart........nor My Anaconda.
1. A very basic sentiment analysis using Python:
Based on positive and negative words from University of Illinois at Chicago (Hu and Liu, KDD-2004).
It is interesting to compare the ratio among different genres. While the words of R&B are highly positive, the same for the Metal is opposite. Hiphop has the most number of words, almost twice as many.
Let's dig in little bit deeper.
What would be some of the most frequent words?
2. Single word (unigram) Analysis using pig:
Let's dig in little bit deeper.
What would be some of the most frequent words?
2. Single word (unigram) Analysis using pig:
Left to Right: Country, Hiphop, Jazz, Metal, Rock, R&B, Pop. Common stop-words are dropped.
Some of frequent words:
Some of frequent words:
- Hiphop: ain't, b!tch, @ss, fu(k
- Metal: fu(k, cut, que, hate
wikipedia: N-gram
3. Bigrams using pig.
I paired words and group them by the first words.
e.g. paired words grouped by "love"
3. Bigrams using pig.
I paired words and group them by the first words.
e.g. paired words grouped by "love"
Full lists of unigram, bigram, and trigram can be downloaded from
https://s3-us-west-1.amazonaws.com/lyric-n-gram/lyric_n-gram_by_TaeKim.zip
Some of interesting bigrams grouped by "i'm":
Dig deeper.
4. Trigrams using Hadoop MapReduce.
https://s3-us-west-1.amazonaws.com/lyric-n-gram/lyric_n-gram_by_TaeKim.zip
Some of interesting bigrams grouped by "i'm":
- Country: (i'm,rent), (i'm,train),(i'm,wagon),(i'm,wheel),(i'm,boxcar),(i'm,guitar),(i'm,road.trailers),(i'm,baptized),(i'm,guitar)
- Hiphop: (i'm,club),(i'm,coco),(i'm,fu<k),(i'm,holla),(i'm,hollywoood),(i'm,motherfu<ker),(i'm,money),(i'm,ne-yo),(i'm,monopoly)
- Jazz: (i'm,baby),(i'm,back),(i'm,spain),(i'm,top),(i'm,seat)
- Metal: (i'm,fu<k),(i'm,abandoned),(i'm,motherf<cker),(i'm,wolf),(i'm,beast)
- Pop: (i'm,crying),(i'm,kissed),(i'm,amazing),(i'm,darling),(i'm,denying),(i'm,falling),(i'm,boss),(i'm,cool)
- R&B: (i'm,hair),(i'm,babe),(i'm,shuck),(i'm,smile),(i'm,jiving),(i'm,lonely),(i'm,loving),(i'm,pretender),(i'm,touch),(i'm,honey)
- Rock: (i'm,scar),(i'm,sh!t),(i'm,storm),(i'm,fu<k),(i'm,shooting),(i'm,smoother),(i'm,tonight?),(i'm,champagne),(i'm,love),(i'm,mean)
Dig deeper.
4. Trigrams using Hadoop MapReduce.
Here are results for trigram (most frequent 3 words in sequence):
I can assume that "words were like little toy guns" is repeating in a song out of top 100 country songs. An interesting finding is that lyrics of Country music do not repeat itself much compare to other genres.
"twerking like Miley" was used 25 times. Uhhh.... Ok.
I don't know much about Metal, but these are some of the most frequently repeated trigrams out of top 100 Metal this week.
Pop in general is highly repetitive. Can you see lyrics of "Shake it off" by Taylor Swift and "Uptown funk" ft Bruno Mars painted here? The lyrics in these popular songs are highly repetitive that we can instantly recognize where these trigrams come from. (Do you hear the songs playing in your head?)
You can literally put these trigrams together & they will make a sentence or even a poem. It seems that lyrics of R&B are very narrative. Remember R&B has the highest positive words ratio?
Thank you for reading and providing an opportunity to explore!!
A research paper from HP Labs mentions that determining genre solely based on lyrics can be challenging without acoustic information. I don't want to jump to conclusions with only one week sample, but lyric in each of these genres seems to have certain (even though very subtle) distinguishable characteristics.
If I continue to build the dataset, it will be fascinating if this can capture the shift of cultural trends since music reflects the culture of the time and the generation. (e.g "twerking like Miley").
Please feel free to ask me any questions or provide suggestions.
Please also note that some of these findings may be subjective & may contain few errors.
----
Update: After 5 years since I participated the hackathon, here is the latest result as of '20 April:
Pop
"like you do": 35 times
"ants go marching": 30 times (I had to look this up to double check which song triggered this. "The Ants Go Marching" is ranked at #37!! I guess my house isn't the only one with kids songs being played nonstop due to COVID-19).
"I want your": 24 times
"in love with": 23 times
"gonna miss me": 19 times
"with your body": 15 times
Hiphop (genre with the most # of trigrams)
"pull me up": 46 times
"I got the": 33 times
"I like it": 29 times
"in the club": 26 times
"You want me": 24 times
"get a taste": 19 times
R & B
"feel it coming": 60 times
"day lovely day": 44 times
"you love me": 37 times
"oh happy day": 27 times
"if you love": 22 times
"I love you": 10 times
Rock
"yeah yeah yeah": 23 times
"let it be": 19 times
"sink back into": 18 times
"need a girl": 15 times
"where my demons": 12 times
"ring of fire": 11 times
Country
"la la la" : 47 times (A song titled "Whiskey Lullaby" repeats this)
"all night long": 19 times
"that old time": 18 times
"I'll be your": 18 times
"old time religion": 18 times
"never find you": 13 times
"I love you": 11 times
Jazz
"la la la" : 51 times (A song titled "Loving You" repeats this)
"to da Lord": 19 times
"killing me softly": 17 times
"and love me": 14 times
"I love you": 12 times
Metal (genre with the least # of trigrams)
"make me feel": 21 times
"feel brand new": 14 times
"run away run": 14 times
"why do they": 12 times
"I don't care": 11 times
----
www.bigdata.nyc
A research paper from HP Labs mentions that determining genre solely based on lyrics can be challenging without acoustic information. I don't want to jump to conclusions with only one week sample, but lyric in each of these genres seems to have certain (even though very subtle) distinguishable characteristics.
If I continue to build the dataset, it will be fascinating if this can capture the shift of cultural trends since music reflects the culture of the time and the generation. (e.g "twerking like Miley").
Please feel free to ask me any questions or provide suggestions.
Please also note that some of these findings may be subjective & may contain few errors.
----
Update: After 5 years since I participated the hackathon, here is the latest result as of '20 April:
Pop
"like you do": 35 times
"ants go marching": 30 times (I had to look this up to double check which song triggered this. "The Ants Go Marching" is ranked at #37!! I guess my house isn't the only one with kids songs being played nonstop due to COVID-19).
"I want your": 24 times
"in love with": 23 times
"gonna miss me": 19 times
"with your body": 15 times
Hiphop (genre with the most # of trigrams)
"pull me up": 46 times
"I got the": 33 times
"I like it": 29 times
"in the club": 26 times
"You want me": 24 times
"get a taste": 19 times
R & B
"feel it coming": 60 times
"day lovely day": 44 times
"you love me": 37 times
"oh happy day": 27 times
"if you love": 22 times
"I love you": 10 times
Rock
"yeah yeah yeah": 23 times
"let it be": 19 times
"sink back into": 18 times
"need a girl": 15 times
"where my demons": 12 times
"ring of fire": 11 times
Country
"la la la" : 47 times (A song titled "Whiskey Lullaby" repeats this)
"all night long": 19 times
"that old time": 18 times
"I'll be your": 18 times
"old time religion": 18 times
"never find you": 13 times
"I love you": 11 times
Jazz
"la la la" : 51 times (A song titled "Loving You" repeats this)
"to da Lord": 19 times
"killing me softly": 17 times
"and love me": 14 times
"I love you": 12 times
Metal (genre with the least # of trigrams)
"make me feel": 21 times
"feel brand new": 14 times
"run away run": 14 times
"why do they": 12 times
"I don't care": 11 times
----
www.bigdata.nyc