For churn we'll take a similar approach to [[Pecan]] but do so using data available from Mixpanel. We could do the same with SQL too.
1. We need two inputs from the user to create the model
1. Classification - The event that they consider activity. We can show from the Mixpanel API a list of events. The user can then choose one or many events that they consider user activity.
2. Time Frame - We ask the user to determine what date they want to see if the user retains. For example we can check for D7, D14, D30 or D60. The model will try to predict if a user will be active on the 'n'th day from the current date.
2. Using this information we create a model that will retrain in a weekly interval.
1. To train our model we look at all the users that came in today - n days before. So for example if today is June 1st and we want to model D30 retention we will look at all users that joined before May 1st. We can achieve this using the Mixpanel Engage API
2. We then classify the users as retained or not. This can be done by calling the Mixpanel Activity Feed API and getting a list of events a user has performed in a given date.
3. Finally we create a list of attributes that we can use for the model training. By default Mixpanel will give us profile attributes for a given user. Beyond this we can also consider other event attributes that occur frequently. Using the Activity Feed API we can look at the list of activities the user performed on the first day of their joining and add those as attributes as well.
3. Sample Information - At this point we should have a set of users with the following columns. Based on Pecan we would require around 1000 users to be able to actually train an effective model
- User ID
- Date Joined
- Classification
- Attributes - Name, Age, Gender, City etc.
- Event Attributes - RSVPed, Watched Stream, Made a purchase etc.
4. We will divide this set into 90% and 10% for training and testing. We try a few classification algorithms and test it with the testing set.
- I don't have full clarity on this. I know how to run this stuff locally on my computer but how would it work on the server. How can we save and deploy the model somewhere is still stuff I need to learn about.
5. Once we have a trained model we can provide a few tables to the user potentially in the tab that the predictive analysis tab
1. Users likely to churn - In the first version it might be harder to cohort the user and show ARPU. Instead we'll show a list of 10-20 users and allow them to download CSV that contains a few attributes of the user along with the churn prediction we have made.
2. New user churn rate - Of all the users that joined in the last weeks what is their expected churn percentage.
3. Churn Correlation Table - Factors that lead to or correlate to churn and are potential causes. This is similar to the column importance that Pecan has.
4. Churn Prevention Table - Factors that prevent churn or are negatively correlated to churn and are potential remedies. This is similar to the column importance table again.
6. We will allow the user to then predict churn for a specific cohort of these. We will allow users to filter in two ways in this dashboard they can predict churn for a user that does an action or predict churn for a user with an attribute. The days we will unfortunately not be able to change her and we'll have to train a new model taking a new input.