Skip to contents

This data reports the publication output (number of articles and number of citations received) for a few scientists from the start of their career to 2000. Most of the variables are processed from the Microsoft Academic Graph (MAG) data set. A few variables are randomly generated.

Usage

data(base_pub, package = "fixest")

Format

base_pub is a data frame with 4,024 observations and 10 variables. There are 200 different scientists and 51 different years (ends in 2000).

  • author_id: scientist identifier

  • year: current year

  • affil_id: affiliation ID of the scientist's current affiliation

  • affil_name: affiliation name of the scientist's current affiliation (character)

  • field: field name of the scientist (character), time invariant

  • nb_pub: number of publications of the scientist for the current year

  • nb_cites: number of citations received by the publications of the scientist in the current year. Accounts for the citations received from articles published up to 2020.

  • birth_year: birth year of the scientist (this is randomly generated)

  • is_woman: 1 if the scientist is a woman, 0 otherwise (this is randomly generated)

  • age: current age of the scientist (formally year - birth_year)

Source

The source of this data set is the Microsoft Academic Graph data set, extracted in 2020. Now a defunct project, you can find similar data on OpenAlex.

The variables birth_year, is_woman and age were randomly generated. All other variables have created from the raw MAG files.