Frowns Documentation

Frowns Documentation
Frowns Version 0.2

Introduction
Frowns is a chemoinformatics toolkit geared toward rapid development of chemistry related algorithms. It is written almost 100% in Python with a small portion written in C++.

Frowns is loosely based on the PyDaylight API that Andrew Dalke wrote to wrap the daylight C API. In some cases programs written using PyDaylight will also work under Frowns with a few minor changes. A good overview of PyDaylight is available at http://www.daylight.com/meetings/mug2000/Dalke/overview/

A good place to look at what Smarts and Smiles are is at the daylight web site located at http://www.daylight.com

Frowns Features

Smiles parser
Smarts substructure searching
SD file parser with SD field manipulations
Depiction for SD files with coordinates
Molecule Fingerprint generation
Several forms of Ring Detection available
Simple aromaticity perception
Full source code
Really bad depiction of arbitray molecules! (requires AT&T's GraphViz)

Missing Features

Recursive Smarts searches (coming soon!)
Stereochemistry (this actually exists but Frowns’ can’t canonicalize stereochemistry yet)

Tutorial By Example
All examples are also located in the frowns\examples directory. A good place to browse is also the frowns\test directory that contains the regression test suite used to validate frowns.

The first step is creating a molecule. In most cases this is accomplished by loading a smiles string.

Example 1, Loading a smiles string

from frowns import Smiles
mol = Smiles.smilin("c1ccccc1")

The code in example 1 loads a benzene molecule. Of course we would like to inspect the generated molecule to see if the atoms and bonds are correct.

print mol.cansmiles()
print "atoms"
for atom in mol.atoms:
print atom.symbol, atom.hcount, atom.aromatic

print "bonds"
for bond in mol.bonds:
print bond.bondorder, bond.bondtype

This generates the following output:

c1ccccc1
atoms
C 1 1
C 1 1
C 1 1
C 1 1
C 1 1
C 1 1
bonds
2 4
1 4
2 4
1 4
2 4
1 4

Molecule properties

Name	Property
cansmiles()	Canonical string representation of the molecule
arbsmiles()	Arbitrary string representation of the molecule
atoms	List of atoms in the molecule
bonds	List of bonds in the molecule
handle	Unique integer ID for each object created. Used for consistency with PyDaylight

Atom properties

Name	Property
bonds	list bonds to which this atom is connected
oatoms	list of atoms to which this atom is connected
hcount	number of hydrogens attached to the atom
implicit_hcount	number of implicitly placed hydrogens to balance valence
mass	mass of atom
charge	charge of atom
symbol	atomic symbol of atom
equiv_class	Equivalence class of atom (See canonicalization section)
symclass	Symmetry class of atom (See canonicalization section)
symorder	Symmetry order of atom (See canonicalization section)
xatom(bond)	Return the atom on the other end of bond or None if one doesn't exist.
handle	Unique integer ID for each object created. Used for consistency with PyDaylight

Bond Properties

Name	Property
bondorder	Bond order 1,2,3 (single, double, triple)
bondtype	Bond type 1,2,3,4 (single, double, triple, aromatic)
symbol	symbol of the bond
xatom(atom1)	Return the atom on the other end of this bond and atom1 or None if one doesn't exist.
handle	Unique integer ID for each object created. Used for consistency with PyDaylight

Example 2, Finding duplicates
Now that canonicalization code exists, finding duplicate smiles strings is trivial using Pythons dictionary:

from frowns import Smiles

listOfSmiles = ["CCN", "NCC", "CCC"]

duplicates = {}
for smile in listOfSmiles:
      mol = Smiles.smilin(smile)
      canonicalString = mol.cansmiles()
      if duplicates.has_key(canonicalString):
         print "found duplicate molecule", smile
      else:
         duplicates[canonicalString] = 1

print len(duplicates), "unique molecules found"

(Currently stereo-chemistry is not canonicalized so this information is lost.)

Example 3, Not trusting the frowns ring detection code or the aromaticity perception code.
Frowns is designed to be relatively easy to modify. There are a list of standard transormations that occur when reading a molecule that compute the set of smallest rings and perceive aromaticity.

These can be turned off or replaced by user defined functions. All transformation functions must behave as following:

molecule = transform(molecule)

In practice, mainly for reasons of speed, the molecule is transformed in place and a new molecule is not created. This really should become the more pythonic transform(molecule).

Each molecule reader has a parameter transforms that specify what transforms are apply to the molecule after the parsing process.

from frowns import Smiles

mol = Smiles.smilin("c1ccccc1")
print mol.cansmiles()

mol = Smiles.smilin("c1ccccc1", transforms=[])
print mol.cansmiles()

The output is the following:

c1ccccc1
[c]1[c][c][c][c][c]1

Notice that the molecule without transforms has atoms in braces [c]. This indicates that the aromatic carbon doesn't have any hydrogens attached. Remember that smiles strings can leave out implicity derived information such as the hydrogen counts. Because we haven't transformed the molecule, these hydrogens were never added. The bracket indicates that there is an atom that has a hydrogen count of zero and this doesn't match what should be the implicit hydrogen count to balance valence.
Basically there should be a hydrogen but since there isn't smiles indicates this using brackets.

There are a bunch of perception routines available to play with, these are located in the frowns.perceptions package.

Perception Routine	Description
frowns.perception.sssr.sssr	Fast ring detection code using Figueras' algorithsm
frowns.perception.RingDetection.sssr	Slower ring detection code using spans, similar to Babel's ring detection code. Much slower than Figueras but overcomes some limitations of Figueras algorithm for complex molecules.
frowns.perception.BasicAromaticity.aromatize	Aromatize molecules using simple rules.

Some of the intracacies of writing transforms is (will be...) detailed in the advanced topics section at the end of the tutorial. Some of the problems include Smiles not specifying the difference between single bonds and aromatic bonds. If we look at the bond types of the molecule generate with no transforms:

print "benzene with no transforms"
mol = Smiles.smilin("c1ccccc1", transforms=[])
print mol.cansmiles()

print "atoms"
for atom in mol.atoms:
    print "\tsymbol %s hcount %s aromatic %s"%(
        atom.symbol, atom.hcount, atom.aromatic)

print "bonds"
for bond in mol.bonds:
    print "\tbondorder %s, bondtype %s fixed %s"%(
        bond.bondorder, bond.bondtype, bond.fixed)

we notice that the bonds are all single!

benzene with no transforms
[c]1[c][c][c][c][c]1
atoms
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
bonds
    bondorder 1, bondtype 1 fixed 0
    bondorder 1, bondtype 1 fixed 0
    bondorder 1, bondtype 1 fixed 0
    bondorder 1, bondtype 1 fixed 0
    bondorder 1, bondtype 1 fixed 0
    bondorder 1, bondtype 1 fixed 0

For the purposes of writing transforms a bond has a special attribute "fixed" if fixed is zero then the bond came in unspecified. For example an bond that is not fixed can have a bondtype of aromatic but a bondorder of single or double. If a bond is fixed then the bondorder can't change. This can be seen in the following example where the bonds in the benzene ring are specified:

mol = Smiles.smilin("c1-c=c-c=c-c=1", transforms=[])
...

The output is now:

benzene with no transforms but bonds fully specified
[c]1[c]=[c][c]=[c][c]=1
atoms
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
    symbol C hcount 0 aromatic 1
bonds
    bondorder 1, bondtype 1 fixed 1
    bondorder 2, bondtype 2 fixed 1
    bondorder 1, bondtype 1 fixed 1
    bondorder 2, bondtype 2 fixed 1
    bondorder 1, bondtype 1 fixed 1
    bondorder 2, bondtype 2 fixed 1

So this basically means that the bondtype is free to become aromatic but the bondorder's have been specified so don't change them. Of course, your transform routines can do whatever you like!

Example 4, Reading an SD File
Frowns can also read in simple SD files. SD files are fairly complex beasts in that the same attribute can be defined in multiple places. For example, the charge on an atom can be defined in the atom line, on a seperate "M CHG" line or on both! In general, don't expect things beyond the MOL file standard to be parsed correctly.

from frowns import MDL

reader = MDL.mdlin(open("../test/data/bad.sdf"))

while 1:
    mol = reader.next()
    if not mol:
        break
    print mol.cansmiles()

Molecules read in from mol files also have additional attributes. Note that these additional attributes are subject to change in future versions of frowns as the API for them is better understood.

Mol files also have fields that frowns keeps around in the molecule's fields attribute.

for key, value in mol.fields.items():
print "%s: %s"%(key, value)

Atoms have "x,y and z" attributes of the coordinate in the mol file

for atom in mol.atoms:
print atom.x, atom.y, atom.z

Atoms and bonds have a special attribute "_line" that holds a copy of line of the file they were generated from. This is currently used as a hack for doing things like extracting the smallest non-connected fragment from a mol file. These attributes might dissappear when Frowns' stereochemistry is up to snuff.

    for atom in mol.atoms:
        print atom._line

    for bond in mol.bonds:
        print bond._line

Example 5, Depicting molecules in SD Files
If a molecule read from an SD files has 2D coordinates it can be displayed. Frowns currently uses Tkinter to make cross platform gui's. Tkinter has plenty of problems but it runs on most platforms.

Aromatic rings are now displayed!

from Tkinter import *
from frowns.Depict.MoleculeDock import MoleculeDock
from frowns import MDL

# read in a molecule
reader = MDL.mdlin(open("../test/data/bad.sdf"))
mol = reader.next()

# create the moleculedock widget and place it
# into a tk window
tk = top = Tk()
m = MoleculeDock(top)
m.pack(fill=BOTH, expand=1)

# add some molecules
m.addMolecule(mol)
m.addMolecule(mol)
m.addMolecule(mol)
m.addMolecule(mol)

mainloop()

Example 5b, Depicting arbitrary molecules
If a molecule does not have coordinates specified then the depictor attempts to generate them. This currently requires AT&T GraphViz graph rendering programs installed.

While these renderings are not great, they are not awful either. Remember, you get what you pay for.

from Tkinter import *
from frowns.Depict.MoleculeDock import MoleculeDock
from frowns import Smiles

# read in a molecule
smiles = ["c1ccccc1C=O", "c1ccc2cc1cccc2",
          "CCN", "CCCC(=O)NNCC"]

# create the moleculedock widget and place it
# into a tk window
tk = top = Tk()
m = MoleculeDock(top)
m.pack(fill=BOTH, expand=1)

for smile in smiles:
    mol = Smiles.smilin(smile)
    m.addMolecule(mol)

mainloop()

Example 6, Smarts Searches
Frowns supports the smarts language for searching molecules (see http://www.daylight.com/ for details of writing Smarts searches)

from frowns import Smiles
from frowns import Smarts

mol = Smiles.smilin("CCN")
pattern = Smarts.compile("CCN")

# simple match
match = pattern.match(mol)
assert match
index = 1
for path in match:
    print "match", index
    print "\tatoms", path.atoms
    print "\tbond", path.bonds
    index = index + 1

Here match contains a list of all path's that match the query. A path is a collection of atoms and bonds in the target molecule that satisfy the match criteria. In this case there is only one possible matching path:

match 1
atoms (Atom(0), Atom(1), Atom(2))
bond (Bond(3), Bond(5))

Similar to the PyDaylight toolkit, when an atom is printed you will see something like "Atom(0)" The 0 here is the handle of the atom object. This is used to distinguish this particular atom from all other atoms in that there can only ever be one Atom(0). This might be a point of confusion in some cases but is really used just to be consistent with PyDaylight.

We can create a more complicated expression, this one matches a carbon single or aromatically bonded to any atom

pattern = Smarts.compile("C*")
match = pattern.match(mol)
assert match
index = 1
for path in match:
    print "match", index
    print "\tatoms", path.atoms
    print "\tbond", path.bonds
    index = index + 1

There are three possible matches in this case (CC), (CC) and (CN)

match 1
    atoms (Atom(0), Atom(1))
    bond (Bond(3),)
match 2
    atoms (Atom(1), Atom(0))
    bond (Bond(3),)
match 3
    atoms (Atom(1), Atom(2))
    bond (Bond(5),)

Finally we can try some logic expressions such as "not a nitrogen atom single bonded to not a carbon atom"

pattern = Smarts.compile("[!N]-[!C]")
match = pattern.match(mol)
assert match
index = 1
for path in match:
    print "match", index
    print "\tatoms", path.atoms
    print "\tbond", path.bonds
    index = index + 1

This only has one match in the molecule (CN)

match 1
atoms (Atom(1), Atom(2))
bond (Bond(5),)

Example 7, Fingerprint creation and use
Frowns can generate molecule fingerprints that are similar to Daylight fingerprints. These fingerprints are really a hash function that is applied to a molecule. The generated hashes are used to quickly through away compounds that cannot match a particular query without actually having to perform the query.

A hash is a binary vector stored as a list of integers. For a given target, if the binary vector is completely contained within the binary vector of a molecule then the target might exist as a substructure of the molecule. If it doesn't the target cannot be a substructure of the molecule.

import frowns.Fingerprint
from frowns import Smiles

pattern = "CCN"
targets = ["CCN", "CCNCC", "c1cccc1CCN", "CC"]

pattern_molecule = Smiles.smilin(pattern)
pfp = frowns.Fingerprint.generateFingerprint(pattern_molecule)

for target in targets:
    mol = Smiles.smilin(target)
    molfp = \
          frowns.Fingerprint.generateFingerprint(mol)
    # pfp must be "in" molfp for test to pass
    if pfp in molfp:
        print "%s hits target %s"%(pattern, target)
    else:
        print "%s does not hit target %s"%(pattern, target)

The output from this example is

CCN hits target CCN
CCN hits target CCNCC
CCN hits target c1cccc1CCN
CCN does not hit target CC

Example 8, Examining Molecule Cycles
The following example shows how to examine the smallest set of smallest rings located in a molecule

from frowns import Smiles

mol = Smiles.smilin("c1ccccc1CCC2CC2")

index = 0
for cycle in mol.cycles:
    print "cycle", index
    print "\t", cycle.atoms
    print "\t", cycle.bonds
    index = index + 1

In this case their are two cycles, a six membered ring and a three membered ring.

cycle 0
    [Atom(5), Atom(4), Atom(3), Atom(2), Atom(1), Atom(0)]
    [Bond(11), Bond(9), Bond(7), Bond(5), Bond(3), Bond(12)]
cycle 1
    [Atom(10), Atom(9), Atom(8)]
    [Bond(22), Bond(20), Bond(23)]

To Do
Using Fingerprints and the java clustering calculator
Rolling your own molecule
Canonicalizing Any Subgraph
Breadth First Walks
The power of Canonical Lists