Replacing Myths with Facts: Quantitative Data Analysis Methodologies

American Community Survey Study Methodology

The analysis of sex ratios uses the 2011 5-year pooled sample of American Community Survey (ACS) data from the Integrated Public Use Microdata Series (IPUMS).  These data are publically available from  The sample is restricted to parents who are in at least one the following categories:

  1. Both parents are US-born, white race (reference group)
  2.  Both parents are born in India
  3.  Both parents are born in South Korea
  4. Both parents are born in China (excluding Hong Kong)
  5. The combination of groups (2) – (3)
  6. Both parents are US-born, Asian race
  7. Both parents are Asian of any background and birthplace

(2) – (7) comprise the comparison groups in the report.  Asian countries/racial backgrounds exclude Pacific Islanders, western Asia, and former Soviet republics.

Parents with multiple races are excluded.  Only two-parent households are included, and one parent must be identified as the head of the household. 

Furthermore, households must have the following characteristics:

  1. No children were born outside the United States so that the analysis does not pick up the effects of other countries’ policies (e.g., China’s one-child policy)
  2. No children are stepchildren or adopted children
  3. All children are 12 years of age or younger
  4. No twins or multiple births (except in the overall analysis)

Ratios are calculated for birth parities 1 -3, with birth parities 2 and 3 subdivided into groups based on the sex of the previous child(ren).  In total, there are six birth-parity groups:  first birth, second birth after girl, second birth after boy, third birth after two boys, third birth after two girls, and third birth after a mix of a girl and boy (any order).  All ratios use household weights because the unit of analysis is a household-child.

The number of surplus Asian American girls provided in the report is calculated by taking the sex ratio of reference-group (white American) children for a given birth-parity group and applying it to the comparison group (Asian Americans).  The difference in girls that would have been born under the reference sex ratio and the observed number of girls is the number of surplus girls.  The overall number of surplus girls does include twins, unlike the individual birth-parity group analyses.

Alexander Persaud's ACS equation.PNG

National Health Center for Health Statistics Study Methodology

We use data from the National Center for Health Statistics (NCHS) USA. The NCHS collects annual data on childbirths across all states, along with gender and birth order of the newborn children, and parental age, race and education. Very detailed information is also collected on mother’s reproductive health, use of prenatal and postnatal care, and newborn health.

Using the NCHS data, we employ a difference-in-differences (DID) method to evaluate the association between sex-selective abortion bans in IL and PA with sex ratios at birth. Our analysis is conducted for the overall population (i.e. all races) and separately for all Asians. Using NCHS data during years 1979-1988 (i.e., 5 years before and after the 1984 ban in IL), we compare the sex ratios in IL with the sex ratios in its neighboring states of Wisconsin, Iowa, Missouri, Kentucky and Indiana. The change in sex ratios in IL from the pre-ban to the post-ban period is compared with the change sex ratios over the same period in IL’s neighboring states. The change in IL relative to its neighbors is therefore representative of the DID association of the ban with sex ratios.

We repeat the above analysis for PA and its neighboring states of New York, New Jersey, Delaware, Maryland, West Virginia and Ohio. Using NCHS data from 1984-1993 (i.e. 5 years before and after the 1989 ban in PA), we use a similar DID method to evaluate the association between the ban and the sex ratios of newborn children in the overall population and separately for all Asians.

Our DID linear probability regression model has the following form:

Arindam Nandi's NCHS equation.PNG

for the i -th child in the s -th state and year t. Gender of the child (whether male) is denoted by G. Binary variables treats and postt denote the treatment status (i.e. whether the state is IL or PA) and post-ban period  (i.e. t  ≥ 1984 for the IL analysis, and  ≥ 1989 for the PA analysis) respectively. The regression includes parental education and the live birth order of the child as covariates (denoted by X). uist is the iid error term of the model. The estimated coefficient β3 measures the DID association between the ban and sex ratios. Standard errors are clustered at the state level, and corrected using the wild bootstrap (1000 repetitions) method.


[1] The NCHS data are publicly available from

[2] A newborn baby is defined as Asian in our analysis if both the parents have reported themselves as Asians.

[3] A. Cameron, J. Gelbach, and D. Miller, “Bootstrap-based improvements for inference with clustered errors,” Rev. Econ. Stat., vol. 90, no. August, pp. 414–427, 2008.