Smoking decisions and information-Emprical Evidences

Quotes from “Do health changes affect smoking”

Our initial intuition is that individuals may not know with certainty the health consequences
of smoking: their beliefs about smoking’s dangers will determine their consumption.
These beliefs will be updated using information from both one’s own health
developments when smoking, and from those amongst other smokers.2 However, a negative
relationship between own past health developments and current smoking is also consistent
with a Grossman model of health demand (in which all parameters are known with certainty)
when there are health shocks.We cannot distinguish empirically between these two

It seems this paper recognize the difficulties to empirically identify the Grossman model and the learning model.

Quote from “Do smokers respond to health shocks”

The conventional wisdom of risk communication (Fischhoff, 1989, Slovic et al. 1985) holds that indirect
experience from public information programs or news media causes people to believe the events at risk can happen,
but the net effects on behavior are smaller than what might be expected. One explanation is that people believe their
personal experience would be better than the “average”conditions reported in these programs.

And they have a very serious way of classifying smoking related health shocks and general health shocks

 To ensure that health shocks are serious health events and hence powerful “shocks,” we include only reports of heart
attack, congestive heart failure, and stroke requiring that the person report at least three days in the hospital between
waves 1 and 2.

For general health shocks, we use the onset of serious medical conditions that are specifically collected by HRS
and that are not linked conclusively to smoking. General medical shocks include the onset of diabetes that resulted in
a hospitalization.


Current smokers react to only smoking-related shocks, and the other groups modify their longevity expectations in response to both types of health shocks.

Other work

As with the chi-square analysis, our primary hypothesis is that the smoking-related (SSt21) and general health (GSt21) events provide new information inducing a revision of a respondent’s subjective longevity beliefs.Our unobservable indicator of the risk equivalent of new information, rt, is hypothesized to be a function of these
measures as in equation 2.

There are important differences between this model and Viscusi’s analysis. In his case, the measure of prior subjective
beliefs, Pt21, was not observed because he had a single cross section. Equally important, because his analysis relied
on a single cross section, the only source for the updating effect hypothesized to underlie equation (3) was the difference in the information available to different demographic groups. That is, one might hypothesize that young adults with higher levels of education have greater information about smoking than do those adults who did not complete high school. Comparing the two groups, the differences in their education would be hypothesized to reflect different amounts of information. Unfortunately, there is no basis for discriminating between this explanation and one that suggests that other variables correlated with education are different.




Living Rationally Under the Volcano? An Empirical Analysis of Heavy Drinking and Smoking

In table 2 ,it would be interesting to see for each transition, how does health status evolve.

key part of the model :

First, and most importantly, an individual does not know how his health
status will evolve over time. In particular, the relationship between smoking and drinking
habits and mortality and morbidity status is stochastic. An individual who engages in heavy
drinking and smoking does not necessarily experience bad health outcomes, but rather has
a higher probability of experiencing negative health shocks in the future. As individuals
experience negative health shocks they will update their believes about the remaining life
expectancy and may change the behavior

But from my reading of the paper, the transition of the health process is not explicitly discussed. And it seems the parameters of the stochastic process are known. My goal is to change the parameters to be unknown and use bayesian updating. The key question is how does adding such element change the model predictions and out-of-sample performance.


Notes about Learning in Monopoly with Investment

Prof. Marc Santugini has a very detailed notes about learning in monopoly with investment.  The idea of this model is firms face uncertainty in the productivity shocks. The firms do not know the distribution of the shocks either. The distribution of the shock depend on the parameter theta. In each period, a shock is drawn from the distribution. The realized shocks provide information about the true distribution of the shocks. And the firms learn about the true distribution in a Bayesian way.

The model is very similar to the one I proposed about learning in the health context. The model I wrote is individuals face uncertainty in the health status. And the medical expenditure act as the realized productivity shocks which provide information about the distribution of the health status. Individuals make their decisions about healthy and risky behaviours which resemble the investment decisions in the Monopoly model. The difference is the decisions in the health model affects the distribution of health status, but only in a deterministic way. In this case, the change to the model is quite trivial. The contribution , then should lie in the model estimations.


Notes about Factor analysis

Coming from an economic background, I am quite ignorant of methods often used in public health literature. I run into factor analysis in a number of papers recently. One is by Sarah Cattan which discusses how psychological traits could explain the gender wage gap.  Another is by Prof.Lange  which discusses the evolution of latent health over the life course.

I am quite surprised by the idea of identifying parameters which governs the correlation of unobservables to observables.  The notes here gives a very clear discussion of the idea behind factor analysis and some of the key features in this method.

Some thoughts on modelling how information alters risk and healthy behavior

# update June 3 , 2014:

I think PSID satisfies the data requirement listed below.

# Modelling idea

The idea is individuals do not know their health status accurately. They have a belief about the distribution. A negative health shock reveals information about the distribution. A health shock could be results from routine checkups or an acute disease. Individuals make decisions about risk and healthy behavior each period to maximize total utility.

The data requirements would require repeated information about risky and healthy behaviors. Risk behaviors could include smoking, drinking , unsafe sexual behaviors and drugs. Healthy behaviors could be exercising, healthy diet. I would also needs to know when a negative health shock arrives. This means i need to know when he visited a hospital or be identified with some type of disease.

caveat : if i am sick in stomach , would i reduce my unsafe sexual behavior . probably not. this means i can identify exactly what type of behaviors was changed. this should be a benefit instead of a caveat. this suggests the models power depends crucially on the details of the shocks observed.

# things to do

1. Read the literature on risk behavior. What explains when people engage in risky behaviors.

2. write out a clean proposal . describe what you want to do and how you want to do it.

3. read grossman 1978 to understand exactly how health stock is modelled.

Model of health care demand

Kowalski’s approach has more merits. It allows me to consider the non-linear feature directly . But I can introduce types as in Liran et al to allow different types of individuals to have differnt parameters.  And I do not have to use the exact utilty function as kowalski.

Simple as it is, i think the merit of the paper would be in the dataset itself.

Prof.Lange raise the question about hospital choice. and he is right, this could be where i extend the model .

essentially, two periods, First period, choose hospitals. Second , choose spending. And use backward induction to solve the model .

I should probably start to describe the hospital choice and expenditure data now. the model itself should not be too hard to write up . but the programming may take a while. 

To do lists :

1. Clean up the CHNS data. Write out programms in a clear and organized way. Document the code and the data well.  Estimate time : one week.

2. Write out the models clearly. Estimate time : two to three days.

3.  Program and get the results.



The response of durg expenditure to non-linear contract design : evidence from medicare part D

Note :

1. The choice in kowalski (2012) is choosing the optimal amount of medical care. This is in contrast with the approach in this paper. Einav et al allows individuals to choose whether to consume medical care given theta (costs of medical care) and omega (consequences of not buying care). Kowalski , in princial , has more flexbility to match the data.  Nevertheless, it’s clear that the key to describe the joint process of shocks and purchase behavior in the utility funciton.

2. the model also uses simulated method of moments for estimation.


Estimating the Tradeoff Between Risk Protection and Moral Hazard(Kowalski 2014)

The presentation of this paper is here.

Notes :

  • 1 . How is health or medical care modelled in this paper.  Why, merits and drawbacks.

Model medical care demand , not health directly . Use a specific functional form . The relationship between health care demand and health shocks is entirely pinned down by  the utility function.

  • 2. what is the timing of the model.

First period, observe the health shock. Choose among different insurance plans.

Second period, choose optimal amount of Q , conditional on insurance plan and shocks.

  • 3. How is normalization done.

Not clear.

  • 4. What are the estimated parameters.

Two parameters governing the distribution of shocks. Two parameters in the utility function . And parameters which are the coefficients of  the individual characteristics.

  • 5. How is the stochastic element modelled and why.

The only source of stochastic variation is the health shock.

  • 6. How does this link to your model .

It’s not clear to me the relative merits of modelling either health or health care demand. Modelling health certainly involves the initial state of the health stock for each individual. And how the health care and shocks jointly affects the next period health care stock. (Review other papers who model health directly)   

Other elements of the paper is very close to what I have in mind. Certainly, the simple model could not capture many important elements in the decision to get care. Dynamics, patient-doctor interactions, supply side are ignored.

Brief descriptions of what I have in mind.

First period, observe the shocks. Second period, choose how much to spend conditional on shocks and insurance plans.  Individual characteristics should be considered in the utility functions.

  • 7 . How is the model performance .

The targets are health care spending. From the table in the slides, the model seems to under-predict  the number of zero care and over predicts the share of people who spend a lot.

  • 8 . Why does the model perform so badly in matching the targets and how to improve the model .








CPS-Hourly Wage Rate

In the CPS, hourly wages are measured in every month except for March.  In these months, questions on wages are asked only of about one-quarter of the entire sample.  The sample in the CPS is divided into Rotation Groups.  Households in the CPS are surveyed a total of 8 times over the course of 12 months.  When they enter the sample, they are surveyed once a month for 4 months in a row.  Then they are out of the sample for 4 months.  Finally, they return for an additional 4 months.   Each cohort of households is called a Rotation Group.  Households that are in the last month of either the first or the second round are part of an Outgoing Rotation Group. In the Basic CPS, questions on earnings at their main job are asked only of people in the Outgoing Rotation Groups. The hourly wage is obtained through a somewhat complicated process.  Respondents are first asked about the easiest way to report their total earnings, on an hourly, weekly, or annual basis?  Those respondents who say hourly are then asked to report their hourly rate of pay.  Then they are asked how many hours per week they work at this rate of pay. Workers not paid on an hourly basis are asked on what basis they are paid, weekly, monthly, or yearly.  Then they are asked how much they make in that pay period. This information is then translated into a value for weekly pay. These considerations boil down the following specific advice on how to compute wages from the Basic CPS:

  1. Include in your sample only people in the outgoing rotation groups.  In other words, use HRMIS = 4 or 8 as a criterion for selecting the sample.
  2.  Include in your sample only workers.  One easy way to do this is to use PEERNPER > 0.
  3.  To compute hourly wages for nonhourly workers, use PTERNWA divided by PEERNHRO.
  4.  For hourly workers, use PTERNHLY. Note that for many of these workers it is possible to compute PTERNWA/PEERNHO and that in some cases there will

Stata Tips- Using Infix

When working with large datasets like CPS or SIPP, the merged file size often amounts to few gigabytes.  But often we only need to use a few variables in the data set. The infix function will help to read in only the desired variables and significantly reduce the file size.

To read data into Stata using –infix-, you simply type infix followed by the first variable name and the column range. Notice that you pair the variable name with the column numbers in which its values are recorded for each variable you want to read in, and that you do this all in one –infix– command. The syntax for the command is along the lines of the following:

infix variablename1 #ofcolumns variablename2 #ofcolumns … using filename

Also keep in mind that Stata assumes that the variables you are creating are numeric. If any of your variables contain string values (letters or symbols) then you must specify this in the command by inserting “str” in front of the variable name that contains string values.