Docs

Scientific Functions

tfm_utils.create_matrix(matrix_file, bg=[0.25, 0.25, 0.25, 0.25], mat_type='counts', log_type='nat')[source]

From a JASPAR formatted motif matrix count file, create a Matrix object.

This function also converts it to a log-odds (position weight) matrix if necessary.

Parameters
  • matrix_file (str) – White-space delimited string of row-concatenated motif matrix.

  • bg (list of floats) – Background nucleotide frequencies for [A, C, G, T].

  • mat_type (str) – Type of motif matrix provided. Options are: “counts”, “pfm”, “pwm”. “counts” is for raw count matrices for each base at each position. “pfm” is for position frequency matrices (frequencies already calculated. “pwm” is for position weight matrices (also referred to as position-specific scoring matrices.)

  • log_type (str) – Base to use for log. Default is to use the natural log. “log2” is the other option. This will affect the scores and p-values.

Returns

Matrix in pwm format.

Return type

m (tfm_utils Matrix)

tfm_utils.read_matrix(matrix, bg=[0.25, 0.25, 0.25, 0.25], mat_type='counts', log_type='nat')[source]

From a string of space-delimited counts create a Matrix object.

Break the string into 4 rows corresponding to A, C, G, and T. This function also converts it to a log-odds (position weight) matrix if necessary.

Parameters
  • matrix_file (str) – White-space delimited string of row-concatenated motif matrix.

  • bg (list of floats) – Background nucleotide frequencies for [A, C, G, T].

  • mat_type (str) – Type of motif matrix provided. Options are: “counts”, “pfm”, “pwm”. “counts” is for raw count matrices for each base at each position. “pfm” is for position frequency matrices (frequencies already calculated). “pwm” is for position weight matrices (also referred to as position-specific scoring matrices.)

  • log_type (str) – Base to use for log. Default is to use the natural log. “log2” is the other option. This will affect the scores and p-values.

Returns

Matrix in pwm format.

Return type

m (tfm_utils Matrix)

tfm_utils.score2pval(matrix, req_score, mem_thresh=2.0)[source]

Determine the p-value for a given score for a specific motif PWM.

Parameters
  • matrix (tfm_utils Matrix) – Matrix in pwm format.

  • req_score (float) – Requested score for which to determine the p-value.

  • mem_thresh (float) – Memory in GBs to remain free to system. Once passed, the closest p-val approximation will be returned instead of the exact p-val. Should only occur rarely with very long and degenerate motifs. Used to help ensure the system won’t run out of memory due to these outliers. This is only calculated after each pass, each of which is more time and memory intensive than the last, so changing this value isn’t recommended unless accuracy out to the 8th decimal place is really necessary.

Returns

The calculated p-value corresponding to the score.

Return type

pv (float)

tfm_utils.pval2score(matrix, pval, mem_thresh=2.0)[source]

Determine the score for a given p-value for a specific motif PWM.

Parameters
  • matrix (tfm_utils Matrix) – Matrix in pwm format.

  • pval (float) – p-value for which to determine the score.

  • mem_thresh (float) – Memory in GBs to remain free to system. Once passed, the closest p-val approximation will be returned instead of the exact p-val. Should only occur rarely with very long and degenerate motifs. Used to help ensure the system won’t run out of memory due to these outliers. This is only calculated after each pass, each of which is more time and memory intensive than the last, so changing this value isn’t recommended unless accuracy out to the 8th decimal place is really necessary.

Returns

The calculated score corresponding to the p-value.

Return type

score (float)

Utility Functions

tfm_utils.toPWM(df: pandas.core.frame.DataFrame)[source]

Converts DFs in PPM (position probability matrix) or PCM (position count matrix) and forces then to PWM (position weight matrix format.

\sum_{i=1}^{\infty} x_{i}

Parameters

df (pd.DataFrame) – Takes a dataframe with appropriate columns or rows in any of the following forms

Returns:

tfm_utils.orient_df(df)[source]
tfm_utils.df_to_matrix(df, bg=[0.25, 0.25, 0.25, 0.25])[source]

Converts a dataframe into the space delimited form required by the rest of the library.

Parameters

df (pandas.DataFrame) – A dataframe with the PWM values in it. Must have columns with names “A’”, “C”, “T”, and “G” or rows with the same name

Returns

All values of the df in the correct order and space delimited

Return type

values (str)