Prepare a dataset file#
How to properly format a dataset#
PyORBIT
accepts its specific dataset format, and it can be very picky about it!
In addition to the usual stuff (time, measurement, measurement error) a few additional columns must be specified in order to include instrument-specific parameters to your model. These parameters are:
jitter: a value added in quadrature to the error estimates, useful if the error estimates are underestimated or if their estimation did not include addisional sources of (white) noise
offset: the baseline value of the dataset on top of which our model is added, for example, the systematic radial velocity of the star or the average value of the Full Width Half Maximum of the spectral lines
any other signal of instrumental origin for example a trend in the RV or the FWHM due to instrumental problems during a specific time interval
A generic input data file must have this structure:
1st column: time of the observation
2nd column: independent measurement (e.g. RV)
3rd column: error associated with the measurement
4th column: flag to activate the jitter parameter(s)
5th column: flag to activate the offset parameter(s)
6th column: flag to activate the subset modeling
Tip
Usually it’s always a good idea to include a jitter term, while the offset column may be required or not depending on the kind of dataset, while the last column applies only in specific cases.
Flags must be expressed as integer numbers following the Pythonic way of counting numbers: 0 to activate a flag, -1 to deactivate it. A generic radial velocity dataset should look like this:
2456000.010493 4764.73 1.20 0 0 -1
2456000.719975 4766.58 1.35 0 0 -1
2456001.967132 4779.52 1.23 0 0 -1
.............. ........ .... . . ..
Flags can be used to divide the dataset into groups with different jitter parameters or offset parameters, not necessarily correlated. A common case is the change in the RV offset of HARPS observations after the intervention in 201X. A new offset parameter can be assigned to the observations after the intervention simply by increasing (+1) the value of the flag.
2456000.010493 4764.73 1.20 0 0 -1
2456000.719975 4766.58 1.35 0 0 -1
2456001.967132 4779.52 1.23 0 1 -1
2456002.447132 4779.52 1.23 0 1 -1
2456002.337132 4779.52 1.23 0 1 -1
.............. ........ .... . . ..
In the example above, we have decided to use the same jitter term regardless of the intervention.
Generally speaking, PyORBIT
will assume that the number of parameters is equal to the maximum value of the flag plus one, so pay attention to increasing the flag sequentially and without jumps. Follow these guidelines for a simple and happy life:
Flags must be given in consecutive order starting from zero (Python notation).
The inactive flags must be set to -1.
All the observations that share the same flag value will have the corresponding parameter in common within the dataset.
Different parameters will be used for measurements with different values of the corresponding flag.
Flags in different columns are independent.
Flags in different files are independent
Warning
If a column is missing, PyORBIT
will assume that the corresponding flag is deactivated. However, columns are not labelled, so it is not possible to deactivate the jitter column by removing it without deactivating the offset column as well.
Dealing with several datasets of the same type#
If you are dealing with observations of the same type from different sources (for example, RVs taken with different instruments) you can follow two roads:
Put everything together in a single file, taking care of setting the jitter and offset flag properly
Write a file for each dataset
Many codes expect you to follow the first road. PyORBIT
can work with both, although in some cases you have to use different files when different models must be employed (for example, photometric transit observed with different instruments thus requiring different limb darkening parameters). In general, my advice is to use a file for each dataset because it will make the configuration file more self-explicative and in the long term it will make your life much easier - especially when you are looking back at the analysis after some time!
Exceptions to standard formatting#
Central transit times#
For central time of transit (Tcent
) file data, required by TTV analysis, the structure is slightly
different. The first column identifies the number of the transit (to keep into account missing T0s). This number will help in identifying missing transits (but honestly I don’t remember right now what happens if you start from a random number…)
0 2454959.70736 0.00145 0 -1 -1
1 2454968.99347 0.00225 0 -1 -1
2 2454978.28014 0.00202 0 -1 -1
. ............. ....... . .. ..
Warning
Always set the 5th (offset) and 6th column to -1
(or don’t include
them at all) to avoid unphysical solution (drift and jumps in time are not allowed)
Astrometry#
Working on it!
Ancillary data#
Some models require one or more additional datasets to work. These datasets are used as independent variables, as such they don’t need to be compared to a model and they do not enter into the calculation of the likelihood. For example, when correlating variations of the flux with the position of the star on the CCD, the latter is the independent variable. These datasets do not require jitter or offset terms, so the structure is more relaxed, but the inclusion of a header with the appropriate dataset names - detailed in the documentation of each model - is a fundamental requirement.
# time flux flux_err xoff yoff bg contam smear deltaT roll_angle
9052.138151 1.000289 0.000259 0.230865 -1.670593 0.015518 0.023160 0.000013 0.669403 194.377112
9052.138846 1.000069 0.000258 0.447083 -1.553406 0.015485 0.023059 0.000012 0.648865 192.682123
9052.139541 1.000413 0.000259 0.459320 -1.494080 0.015407 0.023058 0.000012 0.628357 191.009003
9052.140235 1.000144 0.000258 0.791229 -0.604126 0.015293 0.023094 0.000012 0.566956 189.346915
9052.140930 1.000356 0.000259 0.864838 -0.110840 0.015305 0.023187 0.000012 0.566956 187.684830
9052.141625 1.000824 0.000259 -0.025452 -0.497986 0.015294 0.023091 0.000012 0.587402 186.012203
Standard units for input data and model parameters#
Time: day. The code assumes that all epochs, time of observations, and timescales, are expressed in BJD-TDB, although no check is enforced. You can remove a constant from the epochs/time of observations without consequences (e.g. use BJD_TDB - 2450000, Kepler BJD, Tess BJD…), just be sure to do it consistently on all the datasets and the parameters in the configuration file.
Radial Velocities (RV): meter/second. If you use kilometers/second, the fit may still work but all the derived values will be meaningless.
Inverse Bisector Span (BIS): meter/second. As for the RVs.
Full Width Half Maximum (FWHM) of the Cross-Correlation Function: kilometer/second. This is the standard measurement unit for this quantity.
Stellar mass, radius, density: Solar units. Note: some catalogs report the stellar density as g/cm3 or kg/m3, so be sure to convert it accordingly.
Planetary radius, semimajor axis of the orbit: stellar radii. These parameters are conventionally called scaled planetary radius and scaled semimajor axis, and within
PyORBIT
they are denoted with a_Rs
subscript.Angles of any kind: degrees (starting from
PyORBIT
version 9)Physical planetary masses and radii are respectively denoted by
_Me
and_Re
subscripts when in Earth units, and by_Mj
and_Rj
subscripts when in Jupiter units.