API Documentation#

Calibration#

class inquant_tools.Calibration.Calibration(predictions_dict={})#

Bases: object

load_mzml(mzml_path=<class 'str'>)#

Loads the mzML file and extracts spectra data.

Parameters:

mzML_path (str) – The path to the mzML file to be loaded.

exp#

An object to store the loaded mzML data.

Type:

MSExperiment

spectra_data#

A dictionary to store spectra data, including measured m/z, retention time, peaks, and charge.

Type:

dict

Return type:

None

weight(grid_point, return_adjustment=<class 'bool'>)#

Calculates the weights for the grid points based on the given grid point.

Parameters:
  • grid_point (list) – A list containing the grid point coordinates for a data point.

  • return_adjustment (bool) – If True, returns the adjustment factor. If False, returns the weightsn for each grid point.

Returns:

If return_adjustment is True, returns the adjustment factor. If return_adjustment is False, returns a list of weights for each grid point.

Return type:

list or float

calibration(mzML_id, train_size=0.1, rt_grids=50, mz_grids=200, set_training_seed=None)#

Calibrates the predictions based on the mzML file and the specified parameters.

Parameters:
  • mzML_id (str) – The ID of the mzML file to be calibrated.

  • train_size (float, optional) – The fraction of data to be used for training (default is 0.10).

  • rt_grids (int, optional) – The number of bins for the retention time grid (default is 50).

  • mz_grids (int, optional) – The number of bins for the mass-to-charge ratio grid (default is 200).

  • set_training_seed (int or None, optional) – The seed for random number generation (default is None).

dev_matrix#

A dictionary to store the deviation matrix for calibration.

Type:

dict

total_weight#

A dictionary to store the total possible weight for each grid point.

Type:

dict

grid_zero#

An array to store the zero point of the grid.

Type:

np.array

grid_basis#

An array to store the basis of the grid.

Type:

np.array

Return type:

None

get_ppm_tolerance()#

Returns the maximum ppm tolerance from the list of ppm tolerances. This is the recommended m/z tolerance for calculating quantification in INQuant.

Returns:

max_ppm_tolerance – The maximum ppm tolerance from all the predictions.

Return type:

float

write_calibrated_mzml_file(file_path, mzml_name)#

Writes the calibrated predictions to a CSV file.

Parameters:
  • file_path (str) – The path where the calibrated mzML files will be saved.

  • mzml_name (str) – The ID of the mzML file to be saved.

Returns:

  • new_predictions (str) – The path of the new calibrated predictions CSV file.

  • new_mzml (str) – The path of the new mzML file with calibrated spectra.

write_calibrated_predictions_file(file_path, predictions_name=<class 'str'>)#

Writes the calibrated predictions to a CSV file.

Parameters:
  • file_path (str) – The path where the calibrated predictions CSV file will be saved.

  • predictions_name (str) – The name of the predictions file to be saved.

Returns:

new_predictions – The path of the new calibrated predictions CSV file.

Return type:

str

INQuant#

class inquant_tools.INQuant.INQuant(predictions_file, mzml_file_list, proteome_file=None, script_purpose=None, ppm_tolerance=50, empty_values='', confidence_filter=0.95, experiment_name='', output_file_path='', intensity_tolerance=0.2, rt_tolerance=0.01)#

Bases: object

INQuant_close()#

Exit function writes the status file for the experiment, should be run if the script is run without using the with-statement.

time_flag(name)#

Records the time taken since the start of the script and appends the result to the timer counts. If a time_flag is set anywhere within the with-statement, the status file will show how long the script took to reach the time_flag.

Parameters:

name (str) – The name associated with the time flag. This will be used in the status file to indicate the specific point in the script.

Returns:

This method does not return any value. It appends the time information to the timer_counts list attribute.

Return type:

None

run(noise_boundary=0.05, noise_ms_level=2, mbr=False, mbr_tolerance=0.9, cleavage_length=4, quant_method='mean', top_n_peptides=5, normalize_abundance='median', write_psm_table=True, psm_file_name='', write_peptide_table=True, peptide_file_name='', write_protein_table=True, protein_file_name='', write_ungrouped_protein_table=False, ungrouped_protein_file_name='', write_individual_quantification_files=False, individual_quant_file_name='', overwrite_all=False)#

Perform the full quantification process by combining multiple functionalities of the class into a single command. This should be executed when the class is initialized with a with statement to ensure proper management of resources.

Parameters:
  • noise_boundary (float, optional) – The boundary for noise calculation. Default is 0.05.

  • noise_ms_level (int, optional) – The MS level for noise calculation. Default is 2.

  • mbr (bool, optional) – Whether to run the MBR (Match Between Runs) algorithm. Default is False.

  • mbr_tolerance (float) – The minimum confidence for the MBR spectras. Only predictions with a higher confidence will have mbr matches. Default is 0.9.

  • cleavage_length (int, default=4) – The length of the cleavage site used for alignment. This parameter is used to output the protein position, with cleavage length being the number of amino acids before and after the peptide sequence in the protein.

  • quant_method (str, optional) – The quantification method to use. Default is ‘mean’.

  • top_n_peptides (int, optional) – The number of top peptides to consider for quantification values for each protein. Default is 5.

  • normalize_abundance (str, optional) – The method for normalizing abundance values. Options include ‘median’, ‘mean’, ‘tic’, or ‘false’. Default is ‘median’.

  • write_protein_table (bool, optional) – Whether to write the protein table to a file. Default is True.

  • protein_file_name (str, optional) – The name of the protein table file. Default is ‘[experiment_name]_protein_table.csv’.

  • write_peptide_table (bool, optional) – Whether to write the peptide table to a file. Default is True.

  • peptide_file_name (str, optional) – The name of the peptide table file. Default is ‘[experiment_name]_peptide_table.csv’.

  • write_psm_table (bool, optional) – Whether to write the PSM (Peptide-Spectrum Match) table to a file. Default is True.

  • psm_file_name (str, optional) – The name of the PSM table file. Default is ‘[experiment_name]_psm_table.csv’.

  • write_ungrouped_protein_table (bool, optional) – Whether to write the ungrouped protein table to a file. Default is False.

  • ungrouped_protein_file_name (str, optional) – The name of the ungrouped protein table file. Default is ‘[experiment_name]_ungrouped_protein_table.csv’.

  • write_individual_quantification_files (bool, optional) – Whether to write individual quantification files for each mzML file. Default is False.

  • individual_quant_file_name (str, optional) – The name of the individual quantification file. Default is ‘[experiment_name]_quantification_[mzml_file_id].csv’.

  • overwrite_all (bool, optional) – Whether to overwrite all existing files without prompting. Default is False, which will ask for confirmation before overwriting any file.

Raises:

ValueError – If the top_n_peptides parameter is less than 1, a ValueError will be raised.

load_predictions(mbr=False, mbr_tolerance=0.9) dict#

Loads predictions into the class and assigns file IDs for all mzml files in the experiment. If only one mzml file is provided, the file ID will always be ‘mzml’. This function handles multiple mzml files and attempts to match their identifiers to those in the predictions file. Additionally, if mbr (Match Between Runs) is enabled, it processes missing data by imputing values from previous replicates.

Parameters:
  • mbr (bool, optional, default is False) – Flag indicating whether to run mbr (Match Between Runs). If True, missing values for certain replicates will be imputed based on available data from other replicates.

  • mbr_tolerance (float, default is 0.9) – The minimum confidence for the mbr specs. Only predictions with a higher confidence will have mbr matches.

Returns:

sorted_predictions_dict – The function modifies internal class attributes such as mzml_file_dict, sorted_predictions_df, and sorted_predictions_dict. If mbr is True, the method also populates the mbr specifications into sorted_predictions_dict.

Return type:

dict

Raises:

Break : – If there is an error matching mzml files with prediction IDs or other issues arise, the process will stop and the Break exception will be raised.

make_psm_table(predictions_dict={}, noise_boundary=0.05, noise_ms_level=2) dict#

Creates a Peptide-Spectrum Match (PSM) table from the predictions data. This function is dependent on the load_predictions method and requires the predictions data to be loaded first, because of the mzML IDs which are generated in the load_predictions method. Quantifications use user specifications from the initialization of the class.

Parameters:
  • predictions_dict (dict) – Dictionary containing the predictions data. The keys are mzml file IDs and the values are DataFrames containing the predictions for each file.

  • noise_boundary (float, optional, default is 0.05) – Boundary value for noise calculation. This value will be used to define the threshold below which signals are considered as noise.

  • noise_ms_level (int, optional, default is 2) – MS level to be used for noise calculation. Determines which level of the mass spectrometry data (MS1 or MS2) will be considered for noise analysis.

noise_variables#

String containing the noise calculation parameters used in the experiment.

Type:

str

Returns:

psm_dict – Dictionary containing the PSM data with the calculated quantification values.

Return type:

dict

Notes

  • The noise calculation is based on the provided boundary and MS level, which helps in filtering out irrelevant signals.

  • Noise for each experiment can be found in the status file, if such a file is generated by using the with-statement for the class.

make_peptide_table(psm_dict={}) dict#

Pivots the quantification data into a wide format, where the abundance for each peptide in the experiment is moved to a column with a corresponding name. The columns are sorted to combine identical peptides with mass charges within a specified tolerance.

Parameters:

psm_dict (dict) – A dict containing the quantification data for the peptides. This dict should include columns such as ‘sequence’, ‘charge’, and ‘abundance’ to be pivoted into the wide format.

peptide_dict#

Dictionary containing peptide information.

Type:

dict

Returns:

peptide_dict – A dictionary containing the quantification data for the peptides in a wide format. The keys are the peptide sequences and the values are dictionaries with the peptide data and corresponding abundance values for each experiment.

Return type:

dict

Notes

  • The function combines peptides with mass charge ratios within tolerance.

  • This function also updates the self.peptide_dict attribute of the class with the new data.

normalizer(peptide_dict={}, type='median') dict#

Normalizes the quantification data based on the specified method. The default method is ‘median’, where the column with the highest amount of data points is used as the baseline. Other available methods are ‘mean’ and ‘tic’ (Total Ion Current).

Parameters:
  • peptide_dict (dict) – A dictionary containing the quantification data for the peptides in wide format.

  • type (str, optional, default='median') –

    The normalization method to use. Options are:

    • ’median’ : Normalize using the median of the data.

    • ’mean’ : Normalize using the mean of the data.

    • ’tic’ : Normalize using the Total Ion Current method.

Returns:

normalized_dict – The normalized peptide dict with additional values for the normalized abundance values. The new values are named with a suffix ‘_normalized’ to indicate the normalization.

Return type:

dict

Notes

  • The normalization method adjusts the abundance values to correct for systematic biases.

  • The ‘tic’ method sums all intensities in a sample and normalizes each peptide’s abundance by the total sum.

  • This method also updates the self.peptide_dict attribute with the normalized values.

load_proteome(proteome_file='')#

Loads a proteome file (in FASTA format) into a dictionary, where the key is a tuple of the protein’s ID and description, and the value is the protein sequence.

Parameters:

proteome_file (str) – The file path to the proteome file in FASTA format. The file should contain protein sequences, with each entry beginning with a ‘>’ symbol followed by the protein’s ID and description, and then the sequence itself on the next lines.

proteome_dict#

Dictionary containing information from the proteome file. The keys are tuples of the protein’s ID and description, and the values are the protein sequences.

Type:

dict

proteome_description_dict#

Dictionary containing the protein descriptions from the proteome file. The keys are tuples of the protein’s ID and description, and the values are the protein descriptions.

Type:

dict

Returns:

The function updates the class attributes self.proteome_dict and self.proteome_description_dict with the loaded protein sequences and their corresponding descriptions.

Return type:

None

Notes

  • The FASTA format should be well-structured, where each protein entry starts with a ‘>’ symbol, followed by an identifier and description (separated by spaces), and the sequence appears on the following lines.

  • If the proteome file contains multiple sequences, each will be parsed and stored in the dictionary.

compute_alignments(peptide_dict=None, cleavage_length=4) dict#

Runs alignment on the peptide sequences against the supplied FASTA file proteome. Updates the peptide table with information about the number of proteins each peptide aligns to, the corresponding protein accessions, and the protein matches. Initializes the self.protein_dict attribute to store protein information when for the making of the protein table.

Parameters:
  • peptide_dict (dict or path to csv/xlsx file, optional) – A dictionary representing the peptide table. If not provided, the function uses the class attribute self.peptide_dict. Otherwise, it attempts to load the peptide table from the specified file path.

  • cleavage_length (int, default=4) – The length of the cleavage site used for alignment. This parameter is used to output the protein position, with cleavage length being the number of amino acids before and after the peptide sequence in the protein.

protein_dict#

Dictionary containing information about the proteins have peptide matches.

Type:

dict

Returns:

  • peptide_dict (dict) – The updated peptide table with additional information about protein matches, including the number of protein matches, their accessions, and specific matching proteins.

  • protein_dict (dict) – Dictionary containing information about the proteins that have peptide matches. The keys are tuples of the protein’s ID and description, and the values are dictionaries with protein data: accession number, description, peptides aligned, alignment coverage and the length of the protein.

Notes

  • The alignment is performed by comparing the peptide sequences against the protein sequences in the provided proteome (FASTA file).

  • For each peptide, the number of protein matches, their accessions, and the specific matching proteins are recorded in the self.peptide_dict.

  • The FASTA format should contain the protein sequences in a valid format with ‘>’ headers denoting protein IDs/descriptions followed by the sequence.

make_protein_table(psm_dict=None, peptide_dict=None, protein_dict=None, quant_method='mean', top_n_peptides=5) dict#

Groups proteins into clusters based on their shared peptides and properties.

This function analyzes the relationship between peptides and proteins, and groups proteins that share similar peptides. The grouping also takes into account quantification methods and the top N peptides per protein.

Parameters:
  • psm_dict (dict, optional) – A dictionary containing the PSM data. If not provided, the function uses the class attribute self.psm_dict.

  • peptide_dict (dict, optional) – A dictionary containing peptide information. If not provided, the function uses the class attribute self.peptide_dict.

  • protein_dict (dict, optional) – A dictionary containing protein information. If not provided, the function uses the class attribute self.protein_dict.

  • quant_method (str, optional) – The protein quantification method to use for the groups. Options include ‘mean’ (default), ‘median’.

  • top_n_peptides (int, optional) – The number peptides to include in the protein quantification. Default is 5.

grouped_protein_dict#

A dictionary containing the grouped protein information, including the principal protein, description, protein group, number of peptides, unique peptides, PSMs, and coverage percentage. Also updates the class attributes self.psm_dict, self.peptide_dict, self.grouped_protein_dict and self.protein_dict.

Type:

dict

psm_columns#

A list of the output columns and oclumn order for the PSM table.

Type:

list

peptide_columns#

A list of the output columns and column order for the peptide table.

Type:

list

protein_columns#

A list of the output columns and column order for the protein table.

Type:

list

Returns:

grouped_protein_dict – A dictionary containing the grouped protein information, including the principal protein, description, protein group, number of peptides, unique peptides, PSMs, and coverage percentage. Also updates the class attributes self.psm_dict, self.peptide_dict, self.grouped_protein_dict and self.protein_dict.

Return type:

dict

Notes

  • The protein groups are created based on peptide alignment, with quantification methods applied to summarize protein abundances.

write_files(write_protein_table=True, protein_file_name='', write_peptide_table=True, peptide_file_name='', write_psm_table=True, psm_file_name='', write_ungrouped_protein_table=False, ungrouped_protein_file_name='', write_individual_quantification_files=False, individual_quant_file_name='', overwrite_all=False)#

Writes the output files for the experiment. Options include writing individual abundance files, protein table, peptide table, psm table, and grouped protein table. Requires input for the overwrite statement. If none is given, it will prompt for it.

Parameters:
  • write_protein_table (bool, optional) – Whether to write the protein table to a file. Defaults to True.

  • protein_file_name (str, optional) – Specific file name for the protein table. Defaults to an empty string, in which case the function uses the default name for the protein file.

  • write_peptide_table (bool, optional) – Whether to write the peptide table to a file. Defaults to True.

  • peptide_file_name (str, optional) – Specific file name for the peptide table. Defaults to an empty string, in which case the function uses the default name for the peptide file.

  • write_psm_table (bool, optional) – Whether to write the psm (Peptide Spectrum Match) table to a file. Defaults to True.

  • psm_file_name (str, optional) – Specific file name for the psm table. Defaults to an empty string, in which case the function uses the default name for the psm file.

  • write_ungrouped_protein_table (bool, optional) – Whether to write the ungrouped protein table to a file. Defaults to False.

  • ungrouped_protein_file_name (str, optional) – Specific file name for the ungrouped protein table. Defaults to an empty string, in which case the function uses the default name for the ungrouped protein file.

  • write_individual_quantification_files (bool, optional) – Whether to write individual quantification files for each peptide or protein. Defaults to False.

  • individual_quant_file_name (str, optional) – Specific prefix for individual quantification files. Defaults to an empty string, in which case the function uses the default prefix for the quantification files.

  • overwrite_all (bool, optional) – Whether to overwrite existing files. If set to True, all existing files will be overwritten without prompt. Defaults to False, in which case the function will prompt the user for confirmation before overwriting any files.

Returns:

This function does not return any value. It writes the output files for the experiment as specified in the parameters.

Return type:

None

Notes

  • If the overwrite_all parameter is set to True, the function will automatically overwrite any existing files without confirmation.

  • The function handles multiple output files, depending on the specific options provided by the user.

  • If any file names are not specified, default names will be used.