Prepare a dataset file#

How to properly format a dataset#

PyORBIT accepts its specific dataset format, and it can be very picky about it!

In addition to the usual stuff (time, measurement, measurement error) a few additional columns must be specified in order to include instrument-specific parameters to your model. These parameters are:

  • jitter: a value added in quadrature to the error estimates, useful if the error estimates are underestimated or if their estimation did not include addisional sources of (white) noise

  • offset: the baseline value of the dataset on top of which our model is added, for example, the systematic radial velocity of the star or the average value of the Full Width Half Maximum of the spectral lines

  • any other signal of instrumental origin for example a trend in the RV or the FWHM due to instrumental problems during a specific time interval

A generic input data file must have this structure:

  • 1st column: time of the observation

  • 2nd column: independent measurement (e.g. RV)

  • 3rd column: error associated with the measurement

  • 4th column: flag to activate the jitter parameter(s)

  • 5th column: flag to activate the offset parameter(s)

  • 6th column: flag to activate the subset modeling

Tip

Usually it’s always a good idea to include a jitter term, while the offset column may be required or not depending on the kind of dataset, while the last column applies only in specific cases.

Flags must be expressed as integer numbers following the Pythonic way of counting numbers: 0 to activate a flag, -1 to deactivate it. A generic radial velocity dataset should look like this:

  2456000.010493  4764.73  1.20  0  0  -1
  2456000.719975  4766.58  1.35  0  0  -1
  2456001.967132  4779.52  1.23  0  0  -1
  ..............  ........ ....  .  .  ..

Flags can be used to divide the dataset into groups with different jitter parameters or offset parameters, not necessarily correlated. A common case is the change in the RV offset of HARPS observations after the intervention in 201X. A new offset parameter can be assigned to the observations after the intervention simply by increasing (+1) the value of the flag.

  2456000.010493  4764.73  1.20  0  0  -1
  2456000.719975  4766.58  1.35  0  0  -1
  2456001.967132  4779.52  1.23  0  1  -1
  2456002.447132  4779.52  1.23  0  1  -1
  2456002.337132  4779.52  1.23  0  1  -1
  ..............  ........ ....  .  .  ..

In the example above, we have decided to use the same jitter term regardless of the intervention.

Generally speaking, PyORBIT will assume that the number of parameters is equal to the maximum value of the flag plus one, so pay attention to increasing the flag sequentially and without jumps. Follow these guidelines for a simple and happy life:

  • Flags must be given in consecutive order starting from zero (Python notation).

  • The inactive flags must be set to -1.

  • All the observations that share the same flag value will have the corresponding parameter in common within the dataset.

  • Different parameters will be used for measurements with different values of the corresponding flag.

  • Flags in different columns are independent.

  • Flags in different files are independent

Warning

If a column is missing, PyORBIT will assume that the corresponding flag is deactivated. However, columns are not labelled, so it is not possible to deactivate the jitter column by removing it without deactivating the offset column as well.

Dealing with several datasets of the same type#

If you are dealing with observations of the same type from different sources (for example, RVs taken with different instruments) you can follow two roads:

  1. Put everything together in a single file, taking care of setting the jitter and offset flag properly

  2. Write a file for each dataset

Many codes expect you to follow the first road. PyORBIT can work with both, although in some cases you have to use different files when different models must be employed (for example, photometric transit observed with different instruments thus requiring different limb darkening parameters). In general, my advice is to use a file for each dataset because it will make the configuration file more self-explicative and in the long term it will make your life much easier - especially when you are looking back at the analysis after some time!

Exceptions to standard formatting#

Central transit times#

For central time of transit (Tcent) file data, required by TTV analysis, the structure is slightly different. The first column identifies the number of the transit (to keep into account missing T0s). This number will help in identifying missing transits (but honestly I don’t remember right now what happens if you start from a random number…)

  0   2454959.70736   0.00145   0   -1   -1
  1   2454968.99347   0.00225   0   -1   -1
  2   2454978.28014   0.00202   0   -1   -1
  .   .............   .......   .   ..   ..

Warning

Always set the 5th (offset) and 6th column to -1 (or don’t include them at all) to avoid unphysical solution (drift and jumps in time are not allowed)

Astrometry#

Working on it!

Ancillary data#

Some models require one or more additional datasets to work. These datasets are used as independent variables, as such they don’t need to be compared to a model and they do not enter into the calculation of the likelihood. For example, when correlating variations of the flux with the position of the star on the CCD, the latter is the independent variable. These datasets do not require jitter or offset terms, so the structure is more relaxed, but the inclusion of a header with the appropriate dataset names - detailed in the documentation of each model - is a fundamental requirement.

# time flux flux_err xoff yoff bg contam smear deltaT roll_angle
9052.138151     1.000289     0.000259     0.230865     -1.670593     0.015518     0.023160     0.000013     0.669403     194.377112
9052.138846     1.000069     0.000258     0.447083     -1.553406     0.015485     0.023059     0.000012     0.648865     192.682123
9052.139541     1.000413     0.000259     0.459320     -1.494080     0.015407     0.023058     0.000012     0.628357     191.009003
9052.140235     1.000144     0.000258     0.791229     -0.604126     0.015293     0.023094     0.000012     0.566956     189.346915
9052.140930     1.000356     0.000259     0.864838     -0.110840     0.015305     0.023187     0.000012     0.566956     187.684830
9052.141625     1.000824     0.000259    -0.025452     -0.497986     0.015294     0.023091     0.000012     0.587402     186.012203

Standard units for input data and model parameters#

  • Time: day. The code assumes that all epochs, time of observations, and timescales, are expressed in BJD-TDB, although no check is enforced. You can remove a constant from the epochs/time of observations without consequences (e.g. use BJD_TDB - 2450000, Kepler BJD, Tess BJD…), just be sure to do it consistently on all the datasets and the parameters in the configuration file.

  • Radial Velocities (RV): meter/second. If you use kilometers/second, the fit may still work but all the derived values will be meaningless.

  • Inverse Bisector Span (BIS): meter/second. As for the RVs.

  • Full Width Half Maximum (FWHM) of the Cross-Correlation Function: kilometer/second. This is the standard measurement unit for this quantity.

  • Stellar mass, radius, density: Solar units. Note: some catalogs report the stellar density as g/cm3 or kg/m3, so be sure to convert it accordingly.

  • Planetary radius, semimajor axis of the orbit: stellar radii. These parameters are conventionally called scaled planetary radius and scaled semimajor axis, and within PyORBIT they are denoted with a _Rs subscript.

  • Angles of any kind: degrees (starting from PyORBIT version 9)

  • Physical planetary masses and radii are respectively denoted by _Me and _Re subscripts when in Earth units, and by _Mj and _Rj subscripts when in Jupiter units.