CANDOCK Modules

Abstract

These scripts are designed to run various parts of candock in a modular fashion. For example: they can be used to generate the fragments for docking without actually doing any docking. These modular can be run independently of each other, and dependencies between modules are taken care of automatically ( IE binding site identification will take place before docking of fragments ). The list of modules is given below:

find_centroids Identifies the binding site of a protein.
prep_fragments Determines which bonds to cut for all ligands.
make_fragments Produces a PDB file for each of the fragments.
dock_fragments Docks the given fragments to a protein.
link_fragments Links the docked fragments together to form the original ligands.
extract_result Extracts all the important parts and makes a PyMOL session.
design_ligands Designs new ligands

General Usage

The modules can be used several different ways and all of these ways are controlled by the same set of variables.

Submission Script Variables

MCANDOCK_PATH (/depot/gchopra-class/apps/candock)

Path containing the candock directories.

MCANDOCK_MOD_PATH (/depot/gchopra-class/apps/candock/modules/)

Path containing all of the the scripts! It’s recommended that one exports this variable in their .bash_profile file. Otherwise, the script will attempt to determine this automatically, which may not work in all cases. You have been warned.

MCANDOCK_VER (v0.3.3)

Version of candock to use.

MCANDOCK_NCPU (8)

Number of CPUs to use.

Important Program Variables

CANDOCK_receptor (receptor.pdb)

File containing the protein that one wishes to dock to.

CANDOCK_ligands (ligands.mol2)

File containing all the ligands that one wishes to use.

CANDOCK_centroid (site.cen)

File to read the binding site from. If this file is not present or is empty, then the binding site will be determined automatically and saved to this file.

CANDOCK_prep (prepared_ligands.pdb)

File containing all the ligands that have been tagged with which fragments they consist of. If this file does not exist or is empty, then the ligands in $ligands will be tagged as such and saved to this file.

CANDOCK_top_seeds_dir (top_seeds)

Directory containing the docked fragments. If it does not exist, then the fragments given in $prep will be docked to the binding site given in $bsite.

CANDOCK_iterative (0)

Flag controlling rigid vs flexible docking. If set to 0, then rigid docking will occur. If set to any value greater than 0, then flexible docking will occur.

CANDOCK_top_percent (0.05)

Number (as a percentage) of each of the highest scored seeds to use. Default is 0.02 which means 2%.

CANDOCK_max_iter (100)

Maximum number of iterations to preform while linking.

CANDOCK_max_possible_conf (All conformations)

Maximum of clustered confirmations to link.

How to run CANDOCK Jobs

There are three ways to use the modules. Each way has advantages and disadvantages and the correct

Function Invocation

Each module exists as both a Bash script and a Bash function. This method is the quickest and dirtiest way of using the modules. It should not be used for long jobs and is best for quickly checking the modules work properly with a given set of variables. To use, start by using the following to load the functions in to the Shell environment source $MCANDOCK_MOD_PATH/load_variables.sh. Now you can invoke a module by simply typing the module name as you would a program name. For example, one could simply type bsite to do binding site identification. To change a variable, simply say varname=varvalue. To change the receptor, for example, use the following: CANDOCK_receptor=1aaq.pdb.

Script Invocation

This method is the recommended method for invoking a module when using a Lab or local machine. Do not use it for jobs that are to be run on the cluster. To invoke a module, simply add $MCANDOCK_MOD_PATH to your $PATH and type the module name. For example, just type prep_fragments.sh and your off and running! To change a variable, you must export it first. Do so like the following example for ligands: export CANDOCK_ligands=new_drug.mol2.

PBS Submission

If you’re using the modules on an RCAC cluster, you must use this method if your jobs is to run for more than a few minutes. It is the most convoluted, but the most powerful. To start, jobs must be submitted using qsub and the full path to each module must be given. Variables are given to qsub as comma separated equalities using the -v option. For example: qsub -v var1=value1,var2=value2. Do not use spaces to separate values. Alternately, one can export variables and use qsub -V. If using a different $MCANDOCK_MOD_PATH than the default, ensure that this variable is exported properly as PBS blocks the scripts ability to determine it’s location. For an interactive job, export $MCANDOCK_MOD_PATH before using the script.

So, the above was really confusing ~and badly written~ - so now there’s an easier way! (tm). The submit_candock_module.sh command simplifies things by running qsub for you. To run dock_fragments, for example, use submit_candock_module.sh dock_fragments. You can pass qsub arguments as well, thus submit_candock_module.sh dock_fragments -v var1=value1 -l myarg=whatever is valid. Note that -V is passed automatically, so make sure your environment is setup properly!

Examples

Create a seeds data base for FLT3

The following will create a seeds database on the standby queue with 20 cores.

cd $RCAC_SCRATCH

mkdir flt3
cd flt3

dlrcsb.pl 4xuf > 4xuf.pdb
grep '^ATOM' 4xuf.pdb | grep ' A ' > receptor.pdb

cp /depot/gchopra/data/seeds_data_base/cando-ligands-3d.mol2 ./ligands.mol2

submit_candock_module.sh dock_fragments

Dock molecules to a single receptor

The following will docking a small number of ligands to a single protein. It will have 200 hours to complete and will run in the gchopra queue.

cd $RCAC_SCRATCH

mkdir working_dir
cd working_dir

cp /path/to/my/protein.pdb ./receptor.pdb
cp /path/to/my/drugs.mol2 ./ligands.mol2

submit_candock_module.sh link_fragments

Dock molecules to several receptors

cd $RCAC_SCRATCH

mkdir working_dir
cd working_dir

mkdir structures compounds seeds_database docking

cd structures

# Do this for all pdb codes you want to dock against
dlrcsb.pl 1PDB > 1pdb.pdb
dlrcsb.pl 2PDB > 2pdb.pdb
# .....
basename -s .pdb -a *.pdb > ../all.lst
# You can also place your own binding sites here, named
# 1pdb.cen, 2pdb.cen, etc

cd ../compounds
cp /path/to/directory/with/your/ligands.mol2 .

# If there's a few ligands ( <100ish )
prep_fragments.sh
# A lot of fagmenets
submit_candock_module.sh prep_fragments

cd ../seeds_database
dock_multiple.sh dock_fragments
# wait for all jobs to finish

cd ../docking
mkdir -p top_0.02/all_conf
cd top_0.02/all_conf
dock_multiple.sh link_fragments

All Variables

Starting Input Files:

Option Name	Default	Description
receptor	receptor.pdb	Receptor filename
ligand	ligands.mol2	Ligand filename

Probis (binding site indentification) Options:

Option Name	Default	Description
neighb	false	Allow only ligands that are in the similar regions according to REMARKs
num_bsites	3	Maximum number of predicted (or given) binding sites to consider for docking
centroid	_None	Filename for reading and writing centroids
names	bslibdb/data/names	Directory with ligand names
lig_clus_file	ligand_clusters.pdb	Ligand clusters found by ProBiS are outputted to this file
nosql	probis.nosql	NoSql-formatted ProBiS alignments output file
probis_min_z_score	2.5	Minimium z-score of ligands to be considered in clustering
probis_clus_rad	3.0	Cluster radius for predicted ligands byprobis
srf_file	probis.srf	File for storing the protein surface calculated by probis.
z_scores_file	z_scores.pdb	Binding site z-scores are outputted to this file
probis_min_pts	10	The minimum number of points (for predicted ligands) required to form a cluster
json	probis.json	Json-formatted ProBiS alignments outputfile
bio	bslibdb/data/bio	Directory with ProBiS-ligands bio database
bslib	bslibdb	Read binding sites library from this directory
centro_clus_rad	3.0	Cluster radius for centroid centers
jsonwl	probis_with_ligands.json	Json-formatted ProBiS alignments with transposed ligands output file

Ligand Fragmention Options:

Option Name	Default	Description
seeds	seeds.txt	Read unique seeds from this file, if itexists, and append new unique seeds if found
max_num_ligands	10	Maximum number of ligands to read in one chunk
prep	prepared_ligands.pdb	Prepared small molecule(s) are outputted to this filename
seeds_pdb	seeds.pdb	File to save full seeds into.

Fragment Docking Options:

Option Name	Default	Description
top_seeds_dir	_None	Directory for saving top docked seeds
top_seeds_file	top_seeds.pdb	Top seeds output file
clus_rad	2.0	Cluster radius for docked seeds
num_univec	256	Number of unit vectors evenly distributed on a sphere for conformation generation
gridpdb_hcp	gridpdb_hcp.pdb	Grid pdb hcp file for output
clusterfile	clustered_seeds.txt	Clustered representative docked-seed conformations output file
excluded	0.8	Excluded radius
conf_spin	10	Spin degrees for conformation generation
grid	0.375	Grid spacing
max_frag_radius	16.0	Maximum fragment radius for creating the initial rotamers
interatomic	8.0	Maximum interatomic distance

Scoring Function Arguments:

Option Name	Default	Description
step	0.01	Step for spline generation of non-bonded knowledge-based potential [0.0-1.0]
obj_dir	obj	Output directory for objective functionand derivatives
comp	reduced	Atom types used in calculating reference state ‘reduced’ or ‘complete’(‘reduced’ includes only those atom types present in the specified receptorand small molecule, whereas ‘complete’ includes all atom types)
potential_file	potentials.txt	Output file for potentials and derivatives
func	radial	Function for calculating scores ‘radial’ or ‘normalized_frequency’
dist	data/csd_complete_distance_distributions.txt	Select one of the interatomic distance distribution file(s) provided with thisprogram
cutoff	6	Cutoff length [4-15].
scale	10.0	Scale non-bonded forces and energy for knowledge-based potential [0.0-1000.0]
ref	mean	Normalization method for the reference state (‘mean’ is averaged over all atomtype pairs, whereas ‘cumulative’ is a summation for atom type pairs)

Forcefield and Minimization Options:

Option Name	Default	Description
gaff_dat	data/gaff.dat	Gaff DAT forcefield input file
max_iter	10	Maximum iterations for minimization during linking
pos_tol	0.00000000001	Minimization position tolerance in Angstroms - only for KB
gaff_xml	data/gaff.xml	Gaff XML forcefield and ligand topologyoutput file
mini_tol	0.0001	Minimization tolerance
update_freq	10	Update non-bond frequency
water_xml	data/tip3p.xml	Water XML parameters (and topology) input file
fftype	kb	Forcefield to use ‘kb’ (knowledge-based) or ‘phy’ (physics-based)
max_iter_final	100	Maximum iterations for final minimization
amber_xml	data/amber10.xml	Receptor XML parameters (and topology) input file

Fragment Linking Options:

Option Name	Default	Description
docked_clus_rad	2.0	Cluster radius between docked ligand conformations
spin	60	Spin degrees to rotate ligand. Allowed values are 5, 10, 15, 20, 30, 60, 90
max_allow_energy	0.0	Maximum allowed energy for seed conformations
docked_dir	docked	Docked ligands output directory
max_clique_size	3	Maximum clique size for initial partialconformations generation
cuda	1 (Implicit)	(=false) Enable cuda iterative linker during linking
max_possible_conf	-1	Maximum number of possible conformations to link (-1 means unlimited)
link_iter	1000	Maximum iterations for linking procedure
upper_tol_seed_dist	2.0	Upper tolerance on seed distance for getting initial conformations of dockedfragments
lower_tol_seed_dist	2.0	Lower tolerance on seed distance for getting initial conformations of dockedfragments
max_num_possibles	200000	Maximum number of possibles conformations considered for clustering
tol_seed_dist	2.0	Tolerance on seed distance in-between linking
top_percent	0.05	Top percent of each docked seed to extend to full molecule
iterative	1 (Implicit)	(=false) Enable iterative minimization during linking
clash_coeff	0.75	Clash coefficient for determining whether two atoms clash by eq. dist12 s< C * (vdw1 + vdw2)

Automated Design Options:

Option Name	Default	Description
change_terminal_atom	_None	Change non-hydrogen atoms that terminate chains to given atoms. Multiple atoms can be given.
target_dir	targets (Implicit)	Directory containing PDB files. These are docked against and labeled as targets.
antitarget_dir	atargets (Implicit)	Directory containing PDB files. These are docked against and labeled as antitargets
force_seed	_None	Force addition of a certain seed from seeds.pdb. Multiple seeds can be given
add_single_atoms	_None	Change hydrogens to given atoms. Multiple atoms can be given.
target_linking	true	Should the ligands be linked for target
antitarget_linking	true	Shoutd the ligands be linked for antitargets
seeds_till_bad	-1	Number of times a seed must be present in the top_seeds for antitargets until it is removed from the good list
fragment_bag	fragment_bag.mol2 (Implicit)	Additional fragments to be added to seeds.pdb
seeds_to_avoid	50	Number of seeds from seeds.pdb to be considered for removal from determined from seeds_to_add
seeds_till_good	-1	Number of times a seed must be present in the top_seeds for targets until it is considered for addition
fragment_mol	fragment_mol.mol2 (Implicit)	Additional fragments to be added to seeds.pdb without rotatable bonds beingcut.
seeds_to_add	50	Number of seeds from seeds.pdb to be considered for addition to the ligands in prepared_ligands.pdb