The primary key for parent tables are determined heuristically from the top-level fields in the document. When sampling, the driver attempts to find the first outermost simple column to designate as the primary key. Columns are then evaluated using the following rules to determine the most viable candidate:
If sampling reveals a duplicate value, the column is not considered a good candidate
If sampling reveals a null value, the column is not considered a good candidate
If sampling reveals certain statistical patterns in the content of the data, the column may be discarded as a candidate
If no top-level column is available, nested columns inside objects may be considered
If the search runs out of candidates, a best-case candidate will be selected
Note that this is just an overview of the rules employed by the driver. Additional and more subtle interactions occur when the driver encounters complex types or unusual data structures or values.