Exercise 1 (Linear regression)

Let's consider course performance data of ''Theoretical Foundations of Computer Science'' (theory of computation). The students could perform the course either in a problem-based way or in a traditional way.
Data consist of the following attributes:

Transform the data into numeric form. You can decide the coding yourself, e.g. grades could be 0, 0.75, 1.0, 1.25,..., 2.75, 3.00. Use gnumeric (or excell) for modelling data.

a) Search all correlation ratios between attributes. Which attributes have the strongest influence on final grades?

b)Construct a linear regression model for predicting final results FR as accepted or failed (more or less than 0.75). Exclude exam points, because they are not knoen during the course. Select only those attributes, which you consider most important. Select randomly 1/5 of data as your test set and use the rest as training set. What are the classification rates?

Data

You can combine data from two sets:

tfcs 2003
tfcs 2004