Creating the target for our model

Now, we'll create the target for our model. This is the column that will tell our model whether each IPO should have been invested in or not. We are going to say that we should invest in any IPO that has a 2.5% or greater return on day one. Obviously, this is an arbitrary number, but it seems like a reasonable value for the investment to be worthy of our attention:

y = df['1st Day Open to Close % Chg'].apply(lambda x: 1 if x > .025 else 0) 

Now that we have set our target column, we need to set up our predictor variables. We'll again use patsy for this:

X = dmatrix("Q('Opening Gap % Chg') + C(Q('Month'), Treatment) + C(Q('Day of Week'), Treatment) 
+ Q('Mgr Count') + Q('Lead Mgr') + Q('Offer Price') + C(Q('Star Rating'), Treatment)", df, return_type="dataframe") 

Let's discuss what's going on in that line of code. X here is our design matrix, or the matrix that contains our predictor variables. We have included the things we discussed earlier that could have some impact on the performance: the size of the opening gap, the month and day of the IPO, the offering price, the lead manager, the number of managers, and finally, the star rating that IPOScoop.com provides in advance of the IPO's listing.

To give some explanation regarding the Qs and Cs found in the lines of code, the Qs are simply used to provide quotes in the formula for columns that have white space in their names, and the Cs are used to indicate that the referenced column should be treated as categorical features and dummy-coded.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset