Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Ivan Png <iplpng@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Identifying first observation in each panel after regression |

Date |
Mon, 4 Jun 2012 18:43:23 -0400 |

What I don't understand: Why the . by gvkey , sort : gen flag = 1 if _n ==1 works when I invoke it before the regression (it then picks up the first observation of each company), but not when I invoke it after the regression (it misses many companies). I used exactly the same command in both cases. On 4 June 2012 18:31, Nick Cox <njcoxstata@gmail.com> wrote: > Which bit don't you understand? > > On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <iplpng@gmail.com> wrote: >> Dear Nick-- >> >> Many thanks for your hint. I found the solution. I execute >> . by gvkey , sort: gen flag = 1 if _n == 1 >> before the regression. >> >> Then, after the regression, I execute >> . gen regsample == 1 if e(sample) >> >> And, to identify the first observation of each company in the >> regression sample, I use >> regsample == 1 & flag == 1 >> >> However, I still don't understand the reason it works. >> >> >> On 4 June 2012 14:24, Nick Cox <njcoxstata@gmail.com> wrote: >>> What code do you mean by "the code below"? >>> >>> I suspect there's something else up with your dataset that leads to >>> what you see. Examine the data omitted by >>> >>> . edit if !e(sample) >>> >>> after your -xtreg- command. >>> >>> Nick >>> >>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <iplpng@gmail.com> wrote: >>>> Many thanks, Nick. Incidentally, thanks for the yeoman service to all >>>> STATAlisters. >>>> >>>> The discrepancy I found was by using xtreg to run a fixed-effects >>>> regression on the sample. xtreg reported 2773 companies. Yet, when I >>>> used the code below on the regression sample, I got only 1048 >>>> companies. So, the only reason I could think of was that the flag >>>> identified only companies that were present in year 1. >>> >>> On 4 June 2012 13:21, Nick Cox <n.j.cox@durham.ac.uk> wrote: >>> >>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work. >>>>> >>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order. >>>>> >>>>> You can check logic in terms of this example: >>>>> >>>>> . webuse grunfeld >>>>> >>>>> . su year >>>>> >>>>> Variable | Obs Mean Std. Dev. Min Max >>>>> -------------+-------------------------------------------------------- >>>>> year | 200 1944.5 5.780751 1935 1954 >>>>> >>>>> . drop if year == 1935 & mod(company, 2) >>>>> (5 observations deleted) >>>>> >>>>> . tab year >>>>> >>>>> year | Freq. Percent Cum. >>>>> ------------+----------------------------------- >>>>> 1935 | 5 2.56 2.56 >>>>> 1936 | 10 5.13 7.69 >>>>> 1937 | 10 5.13 12.82 >>>>> 1938 | 10 5.13 17.95 >>>>> 1939 | 10 5.13 23.08 >>>>> 1940 | 10 5.13 28.21 >>>>> 1941 | 10 5.13 33.33 >>>>> 1942 | 10 5.13 38.46 >>>>> 1943 | 10 5.13 43.59 >>>>> 1944 | 10 5.13 48.72 >>>>> 1945 | 10 5.13 53.85 >>>>> 1946 | 10 5.13 58.97 >>>>> 1947 | 10 5.13 64.10 >>>>> 1948 | 10 5.13 69.23 >>>>> 1949 | 10 5.13 74.36 >>>>> 1950 | 10 5.13 79.49 >>>>> 1951 | 10 5.13 84.62 >>>>> 1952 | 10 5.13 89.74 >>>>> 1953 | 10 5.13 94.87 >>>>> 1954 | 10 5.13 100.00 >>>>> ------------+----------------------------------- >>>>> Total | 195 100.00 >>>>> >>>>> . bysort company (year) : gen first = _n == 1 >>>>> >>>>> . l company year if first >>>>> >>>>> +----------------+ >>>>> | company year | >>>>> |----------------| >>>>> 1. | 1 1936 | >>>>> 20. | 2 1935 | >>>>> 40. | 3 1936 | >>>>> 59. | 4 1935 | >>>>> 79. | 5 1936 | >>>>> |----------------| >>>>> 98. | 6 1935 | >>>>> 118. | 7 1936 | >>>>> 137. | 8 1935 | >>>>> 157. | 9 1936 | >>>>> 176. | 10 1935 | >>>>> +----------------+ >>>>> >>>>> Nick >>>>> n.j.cox@durham.ac.uk >>>>> >>>>> Ivan Png >>>>> >>>>> I am analyzing an unbalanced panel of company data, organized by >>>>> company (gvkey) and year. I want to create a flag to the first >>>>> observation of each company in the panel. I tried >>>>> >>>>> . sort gvkey year >>>>> . by gvkey , sort: gen flag = 1 if _n == 1 >>>>> >>>>> However, this only flagged flag = 1 if a company was present in year 1 >>>>> of the panel. It missed any company that appeared in later years. >>>>> >>>>> I searched statalist and found this: >>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html >>>>> >>>>> But it doesn't work. I'd be grateful for any relevant help. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Best wishes Ivan Png Skype: ipng00 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Identifying first observation in each panel after regression***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Identifying first observation in each panel after regression***From:*Ivan Png <iplpng@gmail.com>

**Re: st: Identifying first observation in each panel after regression***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Matched Pairs, Treatment-Control, Interrupted Panel Study -- What Model will Work?** - Next by Date:
**Re: st: Identifying first observation in each panel after regression** - Previous by thread:
**Re: st: Identifying first observation in each panel after regression** - Next by thread:
**Re: st: Identifying first observation in each panel after regression** - Index(es):