Logistic Regression — Part III — Titanic Disaster Survival Prediction

In this article we will be researching on the Titanic Dataset with Logistic Regression and Classification Metrics.

Lets see how to do logistic regression with Python — LogisticRegression() from sklearn.

I have taken the Titanic data set from Kaggle. https://www.kaggle.com/c/titanic/data

Here I have skipped the data processing section except encoding. I will come up with a new post specially for Data Pre-Processing.

Here I have skipped the data processing section except encoding. I will come up with a new post specially for Data Pre-Processing.

• Removed Cabin as it seem to be Large number of Null columns and not much info can be received with a column which has high manipulated values.
• Encode the Category columns. I am going to do all the encoding for the validation dataset (test.csv) also.
• Drop original columns and concat the encoded columns.

Model — Using LogisticRegression:

Finally we predicted Survived values for test data using predict() method.

Metrics

Packages to import for Error Metrics:

1. Confusion Matrix

It is a clear representation of Correct Predictions. All the Correct Predictions fall in the diagonal order.

2. Classification Accuracy

This metric measures the ratio of correct predictions over the total number of predictions. For Higher accuracy, the model gives best.

Output: 0.8044692737430168

3. ROC Curve & AUC Score

ROC (Receiver Operating Characteristic) curve is a visualization of false positive rate (x-axis) and the true positive rate (y-axis).

predict_proba(…) provides the probability in arrays. pred_prob[:, 1] means we are taking only the positive values.

AUC Score: 0.88

We can see that ROC curve is not bad. With more pre-processing, we can increase the AUC score. To know more about ROC curve, please visit Logistic Regression Part II — Cost Function & Error Metrics.

4. Classification Report

This is a summary of metrics for each class.

In the above report, we have Precision, Recall, F1-Score for both 0 & 1 classes separately.

Conclusion:

In this article we have seen how to predict Discrete values using Logistic Regression.

To know more about Logistic Regression:

Thank you! 👍

Like to support? Just click the heart icon ❤️.

Happy Programming!🎈

0