Affirmations and Activities Modeling to Exercise Neuroplasticity And Insight Generation in Cognitive Domain

Anisha Rao, MSDA Student, Edward Montoya, MSDA Student, Lina Devakumar Louis, MSDA Student, Himanshee, MSDA Student, and Nilisha Makam Prashantha, MSDA

Student

Abstract—The project is a prototype that suggests affirmations based on mood and feeling sentiments and activities based on mood to employ neuroplasticity in individual users. Mood and feelings are the inputs from users. Users are also required to fill out a survey that is fed as an input to a data warehouse. Using Star Schema design, data warehousing is implemented with user surveys to generate insights that have a use case in the cognitive domain. We used domain specific datasets such as ‘Mental health survey’ , ‘Positive Affirmations’, ‘Mood Activities’ to collate user moods-activities, mood-affirmation and affirmation-feeling sentiment. We have used few other subsidiary datasets to improve the accuracy of our project .This project is a prototype for a mood lifter application which works as a self help tool for mental health, named as – All about you. As a business user investing in the prototype, they would wish to know the type of users they are catering to and the trends prevalent. Our visualization aims at providing meaningful insights from surveys collected from users. This survey is time series data and is anonymous to protect the privacy of our users.

Index Terms—Moods,Activities, Mental health, Positive Affirmations, MongoDB, MySQL,Polyglot Persistence, Python, AWS Glue, Redshift, Quick Sight,Mongo Atlas

✦

1 INTRODUCTION

HIS project demonstrates polyglot persistence via a self-help application prototype that

is based on neuroplasticity. The prototype uses MySQL, MongoDB and S3 databases to exercise its goals to suggest affirmations and activities as- sociated with moods and feeling sentiment score. MongoDB aggregation pipelines and SQL func- tionalities have been applied for the various goal achievements. Redshift functionalities have been used to implement star schema design for data warehousing. The data warehouse acts as an an- alytical resource for the cognitive domain.

1.1 Problem Statement

Nearly one in five U.S. adults suffer from men- tal health issues (52.9 million in 2020). Self- affirmations have been shown to decrease stress (Sherman, et al 2009 and Critcher and Dunning, 2015) , used effectively to help people increase physical activity (Cooke, et al 2014) and helped people to eat healthy (Epton and Harris, 2008). Certain activities can help us feel better and change the way we think or perceive life in that moment. On a day to day basis, every individual faces ups and downs in their mental health journey. Our project aims to function as a mood lifterapplication which works as a self help tool for mental health. It

is an app that would let you decide your mood and suggest some positive affirmations and activities that will help an individual to lift up their mood and feel better.

1.2 Motivation

An individual’s way of thinking, feeling or his moods may affect his ability to function each day . Each person has a different way to mitigate prob- lems, and issues i.e. how these problems affect them internally. Over the last decade there has been a substantialincrease in the need and demand for mental health services across the nation. The demand has arguably grown since the beginning of the global pandemic. For instance, a Pew Re- search Center Survey found that about a fifth of

U.S. adults (21%) are experiencing high levels of psychological distress, including nearly three-in- ten (28%) among those who say the pandemic has changed their lives There exists a growing demand for more mental health techniques and technolo- gies to help people mitigate negative thoughts and feelings. Using survey data from Kaggle and AWS Services, we analyzed how mental health affects human lives and their daily chore.That’s why we believe our project has the potential to help people be more mindful byoffering suggestive activities

for their respective mood. Something we feel has the potential to improve their overall well-being.

1.3 Goal

The goal of the project is to provide affirmations and activities suggestions to user by taking 3 feel- ings and their mood as input. We also aim at collecting surveys from users and model that in a data warehouse to gain meaningful insights.

1.4 DataSets used

Positive Affirmations
- Moods-Activities
- Mental Health Survey Data
- Opinion Lexicon

2 LITERATURE SURVEY

Literature Survey was conducted to get all the related information on our project:

1Sowndarya Palanisamy ,P. SuvithaVani dis- cussed the two types of databases :SQL and NoSQL databases. This paper also talks about the ACID property which is followed by SQL and CAP theorem which is used by NoSQL . It also covers comparisons and limitations and querying capability of the database

2Tarina Grolinger, Wilson A Higashino, Ab- hinav Tiwari and Miriam AM Capretz in their paper, talks about how NoSQL and NewSQL is an alternative to handle huge volumes of data.The paper provides per- spective about both the database , the need of flexible schema over RDBMS , scalability aspect and discusses in detail about various data models for NoSQL and NewSQL.

3Erik Swensson in his paper mentioned about AWS , various tools and the advan- tages of aws in Big Data Analytics.

3 SYSTEM REQUIREMENT

3.1 Functional Requirements

Data Pre-processing
- Data Modeling
- Aggregation pipelines using MongoDB functionalities
- Analysis using SQL Statements
- Sentiment Score calculation using SQL Statements
- Data Warehousing for analytics
- Data Visualization

3.2 Software Requirements

OS : Windows, MAC
- Language : SQL, Python
- IDE/tool: Jupyter Notebook, SQL Work- bench, Mongo Compass
- Database : MySQL,MongoDB
- Data Lake: AWS S3
- Data Warehouse : Redshift,Mongo Atlas
- Data Visualization : QuickSight,Mongo Charts
- ETL Tool: AWS Glue

4 SYSTEM DESIGN

4.1 System Architecture

The system architecture is displayed in Fig(1). It gives an overview of the project with it’s different subsections.

Figure 1. System Architecture

4.2 Data Modeling : ER Diagram

For MySQL database we modeled Mood, Activi- ties, Affirmations and Tags as seen in Fig(2).

4.3 Date Warehouse : Star Schema Design

For the Redshift data warehouse we used the star schema design to model it.

4.4 Sequence Diagram

The sequence diagram represents the various se- quences between different project objects. We show two sequence diagrams, one for user view (Fig(4)) and one for business view (Fig(5)).

4.5 Data Flow Diagram

The data flow diagram represents how the data is flowing between the entities and the process.

Figure 2. ERD

Figure 3. Star Data Model

5 PROJECT MODULES

5.1 Data Source Finalization

We required four datasets for the purpose of the project:

Dataset for affirmations for users depending on mood: We finalized on the dataset from the link below. The dataset contains tags. A logic requires to be built to link tags to moods. Link:Positive Affirmations
- Dataset for activity suggestion based on mood: We finalized on the dataset from the link below. Link:Daily Mood Tracker
- Positive and Negative word list: This dataset was required to calculate word scores. Link:Opinion Lexicon
- Mental health survey for checking the rele- vance of the prototype.

Figure 4. User Sequence diagram

Figure 5. Business User Sequence diagram

5.2 Data Preprocessing

5.2.1 Data Cleaning for Mood Logger

Steps Applied using Python:

Values in “mood” and “activities” columns changed to uppercase to ensure uniformity among all the same values of thecolumns.
- Filtering the required columns, “mood” and “activities”.
- The “activities” column was multivariate in every cell separated by ‘—’. split() and explode() functions were used to split the column values and reassign them to differ- ent rows.
- strip() was used on “activities” column val- ues to truncate the spaces that were found after the split.
- Removal of an existing anomaly in the dataset with the value ‘DOTA 2’ in the dataset.

Link:Data prep mood logger-2.ipynb

5.2.2 Data Cleaning for Words dataset:

Steps Applied using Python:

Figure 6. Data Flow Diagram

Removal of affirmations with tags money and love
- Values in the “affirmation” column changed to uppercase to ensure uniformity among all the same values of the columns

5.2.3 Data Cleaning for Affirmations:Added few negative words in the list of neg- ative words dataset such as “low”, “down”, “meh”Added few neutral words such as “neutral”, “ordinary” , “nothing”Assigning +1 score to positive words , -1 score to negative words and 0 to neutral wordsRemoval of “naive” word from the positive list of words as it’s already present in the negative words datasetRemoval of an existing anomaly “nai‘ve” from negative word listValues in the “words” column changed to uppercase to ensure uniformity among all the same values of the columns

5.3 Mood Activities Combinations generation through MongoDB Functionalities

5.3.1 Cleaned dataset – CSV file “Mood Logger”:

This was done through pandas.

5.3.2 Database and Collection Creation in Mon- goDB:

For the result dataset, we decided to use the ag- gregation pipeline feature offered by MongoDB to perform further analysis on it. Through the use of Mongo Compass a database called “myDB” was created and a collection within it called “Mood Logger”. The csv file was loaded as a doc- ument from the above step into this collection. Corresponding JSON fileMood Logger.json

Analysis on “Mood Logger” Collection using Aggregation Pipelines in python:

We wanted to perform an analysis to get the most frequently occurring activities for each mood. The

following aggregations were done to achieve the required result.

Found the count of each activity for every mood.
- Pushed the activity and count into “activ- ity count”.
- unwind “activity count”.
- Using $max found the maximum count for each mood with mood as id. Then pushed all details into a separated field called“grp”.
- Using maxCount filtered the associated set from “grp”. Corresponding python Mood Logger Analysis3.ipynb

Figure 7. Aggregation Pipeline to find most frequent Activities for Moods

5.3.4 Conversion of Aggregation Result to CSV:Saved the result of the above operations into

– agg result.

Converted the Aggregation result- agg result into a dataframe.
- Then converted the dataframe into a csv file by using the to.csv function.
- Got the Result1.csv file. Corresponding python file:Mood Logger Analysis3.ipynb

5.3.5 Visualization of the aggregation pipeline in Mongo Atlas:

We passed the aggregation pipeline to Atlas to generate a Chart for the Mood-Activity Analytics.

Figure 8. Aggregation Pipeline Result from Mongo Atlas

5.4 Normalization

Prepared data for the moods, activities and mood-activity table respectively by extract- ing the mood and activity columns from Re- sult1.csv into new sheets: moods , activities.
- Removed the duplicates for both and added a new column moodID and activityID in each sheet with unique numbers for each row.
- Mapped the moodID and actID columns to the result.csv by using VLOOKUP().
- Extracted the moodID and actID edited columns into a new file moods-activity.
- From the affirmations and tags cleaned dataset, we took a distinct for tags and assigned unique tagID to each.
- Then using excel we did a VLOOKUP and mapped the tagIDs to respective affirma- tions
- The steps also ensured 3NF status for the DB
- Using the mysql package in python tables were populated with the respective csv files.
- Before uploading affirmations and tags csv, we also found each affirmations positivity score using a query.

5.5 Creation of database in MySQL based on ER-Diagram and Population with Normalized Data

5.5.1 Database Creation

Please find the tables and columns from a reverse engineered ERD in Fig(9) Please find ’.sql’ files for database creation in GitHub.

Link:SQL Files

5.5.2 Database PopulationWe used csv files to populate the tables through python API functionality.For the user table we used the interface created to populate the table.Corresponding python file:populating ta- bles with new cleaned data.ipynb

Figure 9. Reverse engineered ERD for All about you DB

Figure 10. SQL snippet of DB and Tables Creation

5.6 Sentiment score Calculation Logic for Af- firmations and Feelings using SQL Functionali- ties

Affirmations: The affirmation was broken down into words. Then the tokenized sen- tence was passed as a tuple in a SQL state- ment to find the associated score for each word from word score table and using ag- gregate sum() function, score was found.
- Feelings: The 3 feelings entered by user was passed as a tuple in a SQL statement to find the associated score for each word from word score table and using aggregate sum() function, score was found.

Figure 11. Query for Sentiment Score calculation for Affirma- tions

Figure 12. Positivity Score for affirmation after population to DB

5.7 UI Creation

We created a simple interface using tkinter package in Python to input user entries and display results.

A simple User Interface, that requires the user to login if he is already a registered user. If not an already registered user, then they can select the option “New User registration” and it takes them to the Registration page

Figure 13. User-Registration page

5.8 UI Integration with DB

We have built the UI logic in Python

– using Jupyter Notebook. The python code creates a connection between python and MySql(where the data is stored). This connection is created by importing mysql.connector and using the mysql.connector.connect function. Below is the code used:

For the table ’user’ we have a constraint on Gender and also we encode password with md5 to protect user’s password.

Figure 14. User Registered Backend View

5.9 Logic Integration

5.9.1 Score Calculation of Feelings Entered by User:

Figure 15. Score – Feeling Mapping

The three feelings entered by user is sent as a tuple in sql statement to calculate score from word score table
- The result sentiment score of the feeling is used in the where clause to find the sentiment of the feeling such as positive, negative or neutral from the score feeling table.

Figure 16. Query to calculate score for 3 Feelings entered by User and to Fetch Feeling Sentiment

5.9.2 Affirmation suggestion based on Mood and Feelings sentiment scoreUsing mood entered by user, a sql statement is written to find the associated tag from the mood tag tableUsing the tag returned another SQL state- ment is written to fetch associated affirma- tion and affirmation score from the affirma- tion tag table.Now for a negative score of feelings entered by the user, we take only the equally posi- tive affirmations. For a neutral and positive score of feeling, all associated affirmations are kept.A shuffle is done on the affirmation list using a random package in python and first affirmation is suggested to the user from the list.In reference with the snippet where user enters, ’gloomy’, ’dull’ and ’annoyed’ as feeling, ’-3’ will be the score of user’s feeling sentiment. Hence we will filter all affirma- tions with score of 3, to give an equally positive suggestion.Also, as seen in referenced snippet, that since user selects ’AWFUL’ Mood, the tag ’HAPPY’ is selected and only those associ- ated activities are fetched.

Figure 17. User Input

5.9.3 Activities suggestion based on Mood:Using the mood entered by the user another SQL Statement is written to fetch associated activities from the mood activity table.All the results are displayed in the UI

Figure 18. Feeling Sentiment, Tag, Affirmation and Activities Result

Figure 19. Feeling score and Affirmation score from Notebook Output for Verification of Suggestion

As seen in referenced snippet, that since user selects ’AWFUL’ Mood, associated ac- tivities are fetched.

5.10 User Survey and Population of survey to MongoDB as Time Series Data

One other instance of our project is that we are taking a survey from our users to anal- yse the industry behavior towards mental health and also figure out insights for the fu- ture scope of the application. The front end of the survey form is built on the user In- terface which loads every session’s records into the mongoDB collection ‘Survey data’.
- To achieve this, we used the mongoDB GUI called mongoDB Compass and using python we populated the collection ‘Sur- vey data’.

5.11 ETL for Survey Data

Refer Fig (22-AWS Glue Job)

5.12 Data Warehousing for Analytics on Trans- formed Data

Based on the star schema that we modeled, we performed the following analytics :

5.13 Data Visualization in QuickSight for Ana- lytics Performed

Figure (9) and (10) are the visualization performed from the results returned from the above queries for the business user.

6 TESTING

We tested our project by using below types of testing.

Figure 20. Contrary Mood & Tags Mapping to Link Affirma- tions to Mood

Figure 21. Code snip with queries to fetch required sugges- tions from DB

6.1 Unit Testing

We employed unit testing to test the different as- pects of our project separately before integrating them together.

6.2 Sanity Testing

We applied sanity testing to fix any bugs in our logic integration by logging our test cases for the user entries. Please find a snippet of our test cases and the results that logged. We used ’FAILURE’ cases to fix bugs in our logic. Through this testing we added additional words into our word-scoretable as well. We also fixed our logic along the way.

7 CONCLUSION

The proposed prototype demonstrated polyglot persistence by using MySQL database for modeling affirmations, tags, activities and moods to suggest users relevant affirmations. Then used MongoDB to store the survey data generated from the app pro- totype. Since we modeled survey results as time- series data, scaling in MongoDB will be beneficial

Figure 22. AWS Glue Job

as the data grows. Whereas for suggesting users af- firmations and activities, relational DB functionali- ties were required to model it in the most efficient way. S3 was used for data warehousing in Redshift which in turn provided business users with cogni- tive domain specific meaningful insights.

Leveraging Aggregation pipelines deemed to be a great approach in simplifying the otherwise required multiple queries if used for the same purpose. Multiple stages were packed within same pipeline to generate the most frequent activity for a particular mood

The implementation of sentiment analysis using SQL functionalities resulted in better suggestions of Affirmations to users, hence improving the effi- ciency.

We made use of AWS Glue as it is serverless to perform ETL on Survey data before using it in the data warehouse. The star schema design for the data warehouse allowed to efficiently query the transformed data to gain meaningful insights. Using Redshift improves the accessibility of the

Figure 23. Redshift Queries

insights as it sits on top of AWS. For instance we made use of Quick Sight to view the insights thereby giving business users a clear picture.

8 FUTURE SCOPE

The project has a very vast scope in future.The following are the future scope for the project:

One of the most important concerns when dealing with health data is that it is very per- sonal. If not secured properly, It can result to cyber attacks and data corruption. To evade this, the future scope of the project would be to implement data encryption according to health industrystandards.
In the future we can create user dashboard to show user and all his moods that was recorded in the system. Using dashboard, we can also track user activities.
Survey data that user enters , gets stored in MongoDB . To perform analytics currently we are manually uploading it and storing it in S3 datalake. In future , we can build a pipeline and can automate this process.

Figure 24. Redshift Queries

Figure 25. Gender distribution

APPENDIX A

TERM PROJECT RUBRICS

A.1 Presentation Skills

An interactive session where each of our team member will be presenting.Overall duration of the presentation would be 20. mins including Q&A.

A.2 Code Walkthrough

Refer to Project Modules

A.3 Discussion / Q&A

An interactive presentation of our project , case study and slide with Q&A throughout the session.

Figure 26. Tech Company Distribution

Figure 27. Test Cases for Sanity Testing

A.4 Demo

Presenting an interactive demo for 5 minutes where any volunteer from our class can test the app.

A.5 Version Control

We are tracking code developed by each individual in Git repository:Git:All-About-You

A.6 Significance to the real world

It is well known that, though many people suffer from mental health issues, but are hesitant to ad- dress this problem. Our project All About You is an initiative to bring a smile on each one of our faces as we believe Happy people, makes happy world

A.7 Lessons learned

Included all the below lessons learned in the report and presentation:

Importance of polyglot persistence- as we have used MySQL as well as NoSQL as the data store for our project (Refer Project Modules section )
- Learned aws tools like Glue Studio for ETL and Redshift for warehousing pur- pose(Refer Project Modules section )
- Learned how Star schema modelling helps in analytics

A.8 Innovation

Choosing cognitive domain and integrated technical functionality of RDBMS and NoSQL.Performed aggregation pipeline on moods activities data to find the preferred activity of all.
- Integrating application with AWS was a complete new paradigm.

A.9 Teamwork

Throughout the project we ensured to have equal distribution of work load. All planning involved equal participation from all teammates.

Figure 28.

A.10 Technical difficulty

Below are the challenges that we faced:

Data cleaning for moods activity table , as the data was not well-structured .
- Performing sentiment analysis using sql queries was one of the major chal- lenge.Ensuring that we stick to the course learning , we came up with score calculation logic to calculate score of each word that user enters and score of theaffirmations.
- To come up with user feeling based on the words he enters and how to model it was one of the difficulty we faced.
- Integrating entire application with Data Warehousing functionality resulted in addi- tion of user survey page , where user can enter the mental health survey.

A.11 Practiced pair programming?

We practiced pair programming using Google Col- lab, zoom calls,google meet & screen-sharing.

A.12 Practiced agile / scrum (1-week sprints)?

We made use of trello to track each team- member task :Trello:All-About-You

We also made use of Slack to communicate if we had any blockers
- We incorporated agile methodology where each one of us would drive the sprint call and conduct the sprint retrospective to ensure work to be completed within time.slack:All-About-You

A.13 Used Grammarly / other tools for lan- guage?

Used Grammarly for proof-reading.

A.14 Slides

Created presentation for the term report with inter- active session and Q&A using canva .

A.15 Report

Follwed IEEE format, completeness, language, pla- giarism,

A.16 Used Unique Tools

Used LaTeX for writing report , used Lucid Chart for creating DFD,Flowcharts and visual paradigm for creating system architecture and use case dia- grams (Refer Project Module section), used canva for slides.

A.17 Performed substantial analysis using database techniques

A.18 Used a new database or data warehouse tool not covered in the HW or class

Used Redshift as a Data Warehouse Service to perform analytics on mental health survey data.

A.19 Used appropriate data modeling tech- niques

ER model for MySQL database (Refer 4.3 )
- Star Schema model for Redshift (Refer 4.2)

A.20 Used ETL tool

Used AWS GLUE to perform ETL operation from S3 bucket. (Refer 5.11)

A.21 Demonstrated how Analytics support business decisions

We created visualisation using AWS Quick Sight for the business user to understand mental health survey data and how it is affecting daily human lives. (Refer 5.12 and 5.13 for more visuals and queries)

A.22 Used RDBMS

Used MySQL databse (Refer 5.5 )

A.23 Used Data Warehouse

Used Redshift (Refer 5.12)

A.24 Includes DB Connectivity / API calls

Used python as programming language to connect UI to database using mysql.connector

A.25 Elevator pitch video

Refer to Elevator pitch submission

A.26 MOM

Refer Figure ;

REFERENCES

Sowndarya Palanisamy, P. SuvithaVani
Tarina Grolinger, Wilson A Higashino, Abhinav Tiwari and Miriam AM Capretz
Erik Swensson