Data Science Posts

  • Thesis: Machine Learning Applied To Build ‘Nowcasting’ Models to Predict Irish Rainfall

    Rain forecasting plays a significant part in our daily lives. Just as long term forecasting, the short term forecasting (Nowcasting) is important as well. While planning for activities, such as outdoor sports, daily commute or even tours, it becomes imperative to check the rain first. Nowcasting is also of interest in airports since extreme conditions, particularly strong winds or storms, are not desired while flight take-off/landings.

  • Complete Analysis of Classification Algorithms

    Abstract: In this analysis, we have the data pertaining to individuals suffering from back pain. We have many clinical parameters that, in combination, determine the back pain type which is a binary outcome. We will use several classification techniques such as, logistic regression, classification trees, bagging, random forest, boosting and support vector machine to model the outcome.

  • Logistic Regression and Performance Assessment

    Let us explore the Titanic Dataset and use Logistic Regression to explore the survival of passengers on the Titanic. The dataset includes 1313 rows corresponding to the people that boarded the Titanic. We have 10 columns of which, we are interested in passengers’ Age, Gender, Class and Survival State. Of these 4 variables, Gender, Class and Survival State are categorical and Age is numeric.

  • Clustering Analysis and Performance Assessment

    Let us explore the US Congress Dataset and analyse clustering algorithms to examine if we have any groups of similar observations in data. The dataset consists of sixteen key votes from 1984 by 435 members of congress and their political party. The data contains lots of NA values and is binary (“y” for a ‘Yes’ vote and “n” for a ‘No’ vote). In this analysis, we converted the y/n data to 1/0 values, since it is easier to work with binary factored data.

  • Association Rule Mining in R

    The dataset: A data set recording the disabilities of 21574 elderly people in the United States of America was collected as part of the National Long Term Care Survey (NLTCS). Each person’s disability in sixteen tasks of daily living were recorded. Six of the tasks are categorized as activities of daily living (ADLs) and ten are categorized as instrumental activities of daily living (IADLs).

  • Mathematica: Exploratory Data Analysis on Data Scientists

    A big picture view of the state of data scientists and machine learning engineers.