drupal

PostNuke to Drupal Conversions: Basic conversion Reference

Book page
This is drawn from http://www.phrixus.net/migration, which at some point drew information from one of my Drupal comments and postnuke forums to drupal forums script.

That said, here's the original post:


Recently this website was migrated from PHP-Nuke to Drupal. Importing the data from one CMS to another presented a number of problems since the database tables are quite different.

The biggest difference is that Drupal treats everything as a 'node' and therefore uses one table for most entries. PHP-Nuke has separate tables for most sections of the site. Migrating the data from PHP-Nuke to Drupal requires that the table ids be changed to avoid conflicts. The trick is to keep all of the IDs relative to each other so the comments and other entries match up properly.

The following snippets are the MySQL code I used to migrate Phrixus from PHP-Nuke to Drupal. Each snippet is a separate file and they should be executed in the order in which they appear.

Executing these scripts as a file requires copying the contents into a file and running the following command:

$ mysql -p drupal < file.sql


Assumptions
  • The database names are 'drupal' and 'nuke'
  • The user doing the migration has full access to both databases.
  • The drupal database is empty aside from the entries created by database.mysql


Credits: Much of the following code was based on the PostNuke to Drupal migration scripts (pn2drupal.sql and pn2drupal_forums.sql) created by David Poblador Garcia and kitt (from drupal.org).

  1. Users
    The first step is to migrate all of the users.

    -- User Migration: PHP-Nuke to Drupal --

    -- Delete existing data --
    DELETE FROM drupal.users;

    INSERT INTO drupal.users
        ( uid, name, pass, mail, timestamp, status, init, rid )
      SELECT
        user_id, username, user_password, user_email, user_lastvisit, 1,
        user_email, 2
      FROM
        nuke.nuke_users ;


  2. Stories
    The next step is to migrate all of the stories and their associated comments.

    -- Story Migration --

    -- Delete existing vocabulary and terms --
    DELETE FROM drupal.vocabulary;
    DELETE FROM drupal.term_data;
    DELETE FROM drupal.term_hierarchy;

    INSERT INTO drupal.vocabulary VALUES
        ( 1, "Content", "Articles, blogs, and other short entry-based content",
          0, 1, 0, 1, "blog,poll,story", 0 );

    INSERT INTO drupal.term_data
        ( tid, vid, name, description, weight )
      SELECT
        topicid, 1, topicname, topictext, 0
      FROM
        nuke.nuke_topics;
     
    INSERT INTO drupal.term_hierarchy
        ( tid, parent )
      SELECT
        topicid, 0
      FROM
        nuke.nuke_topics;


    -- Migrate Stories --

    -- Delete existing nodes --
    DELETE FROM drupal.node;
    DELETE FROM drupal.term_node;

    INSERT INTO drupal.node
        ( nid, type, title, uid, created, comment, promote, teaser, body, changed )
      SELECT
        s.sid, "story", s.title, u.user_id, UNIX_TIMESTAMP(s.time), 2, 1,
        s.hometext,
        CONCAT(s.hometext, "", s.bodytext), now()
      FROM
        nuke.nuke_stories s, nuke.nuke_users u
      WHERE
        s.informant=u.username;
     
    INSERT INTO drupal.term_node
        ( nid, tid )
      SELECT
        s.sid, s.topic
      FROM
        nuke.nuke_stories s;

    -- Migrate Story Comments --

    DELETE FROM drupal.comments;

    INSERT INTO drupal.comments
        ( cid, pid, nid, uid, subject, comment, hostname, timestamp )
      SELECT
        c.tid, c.pid, c.sid, u.user_id, c.subject, c.comment, c.host_name,
        UNIX_TIMESTAMP(c.date)
      FROM
        nuke.nuke_comments c, nuke.nuke_users u
      WHERE c.name=u.username;


  3. Polls
    The next step is to migrate the polls. Since polls in Drupal are also considered nodes, some id offsets need to be set before this script is run.

    -- Migrate Polls --

    -- Make sure new polls don't conflict with existing NIDs --
    -- Use the following query to set the variable --
    -- SELECT MAX(sid) FROM nuke.nuke_stories --
    SET @POLL_NID_OFFSET=87;

    -- Make sure poll comments don't conflict with any existing CIDs --
    -- Use the following query to set the variable --
    -- SELECT MAX(tid) FROM nuke.nuke_comments  --
    SET @POLL_CID_OFFSET=368;


    -- delete any existing data --
    DELETE FROM drupal.poll;
    DELETE FROM drupal.poll_choices;

    INSERT INTO drupal.node
        ( nid, type, title, score, votes, uid, status, created, comment, promote,
          moderate, users, teaser, body, changed, revisions, static )
      SELECT
        pollID+@POLL_NID_OFFSET, "poll", pollTitle, 1, 1, 0, 1, timeStamp, 2, 1, 0,
        "", pollTitle, "", NOW(), "", 0
      FROM
        nuke.nuke_poll_desc;


    -- Migrate Polls --
    INSERT INTO drupal.poll
        ( nid, runtime, voters, active )
      SELECT
        pollID+@POLL_NID_OFFSET, timeStamp, voters, 1
      FROM
        nuke.nuke_poll_desc;

    INSERT INTO drupal.poll_choices
        ( chid, nid, chtext, chvotes, chorder )
      SELECT
        0, pollID+@POLL_NID_OFFSET, optionText, optionCount, voteID
      FROM
        nuke.nuke_poll_data;


    -- Migrate Poll Comments --

    INSERT INTO drupal.comments
        ( cid, pid, nid, uid, subject, comment, hostname, timestamp )
      SELECT
        c.tid + @POLL_CID_OFFSET, IF(c.pid, c.pid+@POLL_CID_OFFSET, 0),
        c.pollID+@POLL_NID_OFFSET, u.user_id, c.subject, c.comment, c.host_name,
        UNIX_TIMESTAMP(c.date)
      FROM
        nuke.nuke_pollcomments c, nuke.nuke_users u
      WHERE c.name=u.username;


  4. Forums
    The next step is the forums. This was the most difficult script to create since it alters so many different tables (nodes, comments, term_data, term_hierarchy, and vocabulary).

    -- Migrate Forums --

    -- Make sure new forum containers don't conflict with existing TIDs --
    -- Use the following query to set the variable --
    -- SELECT MAX(tid) FROM drupal.term_data --
    SET @FORUM_CONTAINER_OFFSET=5;

    -- Make sure new forums don't conflict with existing TIDs --
    -- Use the SUM of the following two queries to set the variable --
    -- SELECT MAX(tid) FROM drupal.term_data --
    -- SELECT COUNT(*) FROM nuke.nuke_bbcategories  --
    SET @FORUM_TERM_OFFSET=7;

    -- Make sure new forum topics don't conflict with existing NIDs --
    -- Use the following query to set the variable --
    -- SELECT MAX(nid) FROM drupal.node --
    SET @FORUM_NID_OFFSET=101;

    -- Make sure new forum comments don't conflict with existing CIDs --
    -- Use the following query to set the variable --
    -- SELECT MAX(cid) FROM drupal.comments --
    SET @FORUM_CID_OFFSET=418;

    -- Create a new vocabulary ID for forums --
    -- Use the following query to set the variable --
    -- SELECT MAX(vid)+1 FROM drupal.vocabulary --
    SET @FORUM_VID=2;


    -- delete existing data --
    DELETE FROM drupal.forum;
    DELETE FROM drupal.vocabulary WHERE vid=@FORUM_VID;

    -- Create the Forums --

    INSERT INTO drupal.vocabulary
        VALUES ( @FORUM_VID, "Forums", "Topics for forums", 0, 1, 0, 1,
                 "forum", 0 ) ;


    INSERT INTO drupal.term_data
        ( tid, vid, name, description, weight )
      SELECT
        cat_id + @FORUM_CONTAINER_OFFSET, @FORUM_VID, cat_title, cat_title, 0
      FROM
        nuke.nuke_bbcategories;


    INSERT INTO drupal.term_hierarchy
        ( tid, parent )
      SELECT
        cat_id + @FORUM_CONTAINER_OFFSET, 0
      FROM
        nuke.nuke_bbcategories;


    INSERT INTO drupal.term_data
        ( tid, vid, name, description, weight )
      SELECT
        forum_id + @FORUM_TERM_OFFSET, @FORUM_VID, forum_name, forum_desc, 0
      FROM
        nuke.nuke_bbforums;


    INSERT INTO drupal.term_hierarchy
        ( tid, parent )
      SELECT
        forum_id + @FORUM_TERM_OFFSET, cat_id + @FORUM_CONTAINER_OFFSET
      FROM
        nuke.nuke_bbforums;

       
    -- Add the forum topics (posts become comments to these) --

    INSERT INTO drupal.node
        ( nid, type, title, uid, status, created, comment, promote, moderate,
          users, teaser, body, changed, revisions, static )
      SELECT
        t.topic_id + @FORUM_NID_OFFSET, "forum", t.topic_title,
        t.topic_poster, 1, t.topic_time, 2, 1, 0, "",
        t.topic_title, t.topic_title, NOW(), "", 0
      FROM
        nuke.nuke_bbtopics t;


    INSERT INTO drupal.forum
        ( nid, tid )
      SELECT
        topic_id + @FORUM_NID_OFFSET, forum_id + @FORUM_TERM_OFFSET
      FROM
        nuke.nuke_bbtopics;

    INSERT INTO drupal.term_node
        ( nid, tid )
      SELECT
        topic_id + @FORUM_NID_OFFSET, forum_id + @FORUM_TERM_OFFSET
      FROM
        nuke.nuke_bbtopics;


    -- Add the forum posts as comments --

    INSERT INTO drupal.comments
        ( cid, pid, nid, uid, subject, comment, timestamp )
      SELECT
        c.post_id + @FORUM_CID_OFFSET, 0, c.topic_id + @FORUM_NID_OFFSET,
        c.poster_id, t.post_subject, t.post_text, c.post_time
      FROM
        nuke.nuke_bbposts c, nuke.nuke_bbposts_text t
      WHERE
        c.post_id=t.post_id;


  5. Journals
    PHP-Nuke has a journals module that I regrettably made use of. The journal section of PHP-Nuke was not well-designed especially with regard to the database schema. Because of this, the migration to drupal was not as smooth as it could have been despite being a very easy query to execute. The problem is that the PHP-Nuke journal table uses VARCHAR for its date fields instead of DATE. While it's possible these dates could be salvaged, I gave up after trying numerous queries. The following script migrates all of the journal content but sets a static date of Jan 01, 2003 for all journals.

    -- Migrate Journals to Personal Blog Entries --

    -- Make sure new journals (blogs) don't conflict with existing NIDs --
    -- Use the following query to set the variable --
    -- SELECT MAX(nid) FROM drupal.node --
    SET @JOURNAL_NID_OFFSET=179;

    INSERT INTO drupal.node
        ( nid, type, title, uid, status, created, comment, promote, moderate,
          users, teaser, body, changed, revisions, static )
      SELECT
        j.jid + @JOURNAL_NID_OFFSET, "blog", j.title, u.user_id, 1,
        UNIX_TIMESTAMP('2003-01-01'), 2, 1, 0, "", j.title, j.bodytext,
        UNIX_TIMESTAMP('2003-01-01'),"", 0
      FROM
        nuke.nuke_journal j, nuke.nuke_users u
      WHERE
        j.status='yes'
      AND
        j.aid=u.username;


  6. Private Messages
    The migration of private messages requires the use of the privatemsg module in Drupal.

    -- Migrate Private Messages --

    -- delete existing data --
    DELETE FROM drupal.privatemsg;

    INSERT INTO drupal.privatemsg
        ( id, author, recipient, subject, message, timestamp )
      SELECT
        p.privmsgs_id, p.privmsgs_from_userid, p.privmsgs_to_userid,
        p.privmsgs_subject, t.privmsgs_text, p.privmsgs_date
      FROM
        nuke.nuke_bbprivmsgs p, nuke.nuke_bbprivmsgs_text t
      WHERE
        t.privmsgs_text_id = p.privmsgs_id ;



  7. Sequences and Database Fixes
    The last step is to update the sequences table so new entries can be created and to fix some of the migration discrepancies that occurred.

    -- Fix some Nuke/Drupal discrepancies --

    -- Set the Drupal site admin username/uid here --
    SET @SITE_ADMIN='david';
    SET @SITE_ADMIN_NUKE_UID=2;

    -- Get the max IDs for various tables in order to update drupal.sequences --
    -- Use the following queries to set the variables --

    -- SELECT MAX(uid) FROM drupal.users --
    SET @MAX_UID=57;

    -- SELECT MAX(nid) FROM drupal.node --
    SET @MAX_NID=256;

    -- SELECT MAX(cid) FROM drupal.comments --
    SET @MAX_CID=947;

    -- SELECT MAX(vid) FROM drupal.vocabulary --
    SET @MAX_VID=2;

    -- SELECT MAX(tid) FROM drupal.term_data --
    SET @MAX_TID=16;
     

    -- PHP-Nuke has UID 1 as 'Anonymous'. Replace with the drupal site admin --
    DELETE FROM drupal.users WHERE uid='1';
    UPDATE drupal.users SET uid='1' WHERE name=@SITE_ADMIN;
    UPDATE drupal.node SET uid=1 WHERE uid=@SITE_ADMIN_NUKE_UID;
    UPDATE drupal.comments SET uid=1 WHERE uid=@SITE_ADMIN_NUKE_UID;
    UPDATE drupal.privatemsg SET author=1 WHERE author=@SITE_ADMIN_NUKE_UID;
    UPDATE drupal.privatemsg SET recipient=1 WHERE recipient=@SITE_ADMIN_NUKE_UID;
       
     
    -- Add the UID 0 so the drupal Anonymous user works properly --
    INSERT INTO drupal.users (uid,rid) VALUES (0,1);
       

    -- Update the sequences table so new entries can be created --
    INSERT INTO drupal.sequences (name, id) VALUES ('users_uid', @MAX_UID);
    INSERT INTO drupal.sequences (name, id) VALUES ('node_nid', @MAX_NID);
    INSERT INTO drupal.sequences (name, id) VALUES ('comments_cid', @MAX_CID);
    INSERT INTO drupal.sequences (name, id) VALUES ('vocabulary_vid', @MAX_VID);
    INSERT INTO drupal.sequences (name, id) VALUES ('term_data_tid', @MAX_TID);


Hopefully these scripts will be useful to others facing a similar situation. Just as note, these scripts do not come with any warranty and are not guaranteed to work. That said, they did work for my migration and with minimal tweaking should at least make a PHP-Nuke to Drupal migration easier.


End of post from other site.

PostNuke to Drupal Conversion: References

Book page

I managed my first conversion from PN to Drupal because of the help of others. In particular, the following helped me greatly:

A drupal node discussing the conversion process http://drupal.org/node/5871 references two other pages:

PostNuke to Drupal Conversion Process

Book page

Having done the conversion from PostNuke (723) to Drupal (4.4) for Gold Country Paddlers, I've been asked to do another conversion. Since the site is *significantly* larger than the GCP site (contrast: GCP was about 1000 entries when I converted it, the new conversion will be over 3800000 entries), I'll be running test cases and adjustable SQL scripts to do the conversion.

One of the requirements is that the site be offline for as short of a time as possible to do the final backend conversion. Some module development will also have to be done, as not all the PN modules are available in Drupal.

That said, here it the journey.

Starting a Freecycle module

Blog

I started my Freecycle module for drupal. You can see an example of it working on my site, though it's in a state of flux and may not be working at any given point.

Freecycle is a growing, grassroots movement that reduces landfill trash by promoting the free exchange of used yet still useable goods. In other words, "One man's trash is another man's treasure."

The basic concept is that some used goods can be used by other people. Rather than throwing out usable goods, an owner can post the item to a list, offering it to others. If another person has need of the goods, s/he can respond requesting the used goods.

Part of the problem I have with the process is the difficulty with selecting one person to give the item to, or asking for an item (oooo! pick me! I want it! I need it. I hate sob-story emails from strangers.). When I post items (and I've posted a lot to my local group), I often get a flood of emails. I then have to figure out which person I should give the item to, arrange for pickup, wait to see if they pickup (no shows are a big deal), and reoffer if the items aren't picked up. I think the "Sorry, already taken." emails after the first n emails are received (where n varies on how much I think someone really wants the items and will be likely to pick them up) suck the most.

This Freecycle module will alleviate some of those issues, by having people sign up online. I'll be able to configure how many emails I accept before automatically terminating the list, provide a giveaway/pickup status for unclaimed items, limit how many items someone can pick up (by email address, IP address, etc.) and provide feedback (ala ebay) about no-shows.

Nothing like scratching an itch for the common good.

Drupal page rendering process

From John VanDyk's drupal install.

A Walk Through Drupal's Page Serving Mechanism or Tiptoeing Sprightly Through the PHP

This is a commentary on the process Drupal goes through when serving a page. For convenience, we will choose the following URL, which asks Drupal to display the first node for us. (A node is a thing, usually a web page.)

http://127.0.0.1/drupal/?q=node/1

A visual companion to this narration can be found on John's site. Before we start, let's dissect the URL. I'm running on an OS X machine, so the site I'm serving lives at /users/user/sites/. The drupal directory contains a checkout of the latest Drupal CVS tree. It looks like this:

CHANGELOG.txt
cron.php
CVS/
database/
favicon.ico
includes/
index.php
INSTALL.txt
LICENSE.txt
MAINTAINERS.txt
misc/
modules/
phpinfo.php
scripts/
themes/
tiptoe.txt
update.php
xmlrpc.php
So the URL above will be be requesting the root directory / of the Drupal site. Apache translates that into index.php. One variable/value pair is passed along with the request: the variable 'q' is set to the value 'node/1'.

So, let's pick up the show with the execution of index.php, which looks very simple and is only a few lines long.

Let's take a broad look at what happens during the execution of index.html. First, the includes/bootstrap.inc file is included, bringing in all the functions that are necessary to get Drupal's machinery up and running. There's a call to drupal_page_header(), which starts a timer, sets up caching, and notifies interested modules that the request is beginning." Next, the includes/common.inc file is included, giving access to a wide variety of utility functions such as path formatting functions, form generation and validation, etc. The call to fix_gpc_magic() is there to check on the status of "magic quotes" and to ensure that all escaped quotes enter Drupal's database consistently. Drupal then builds its navigation menu and sets the variable $status to the result of that operation. In the switch statement, Drupal checks for cases in which a Not Found or Access Denied message needs to be generated, and finally a call to drupal_page_footer(), which notifies all interested modules that the request is ending. Drupal closes up shop and the page is served. Simple, eh?

Let's delve a little more deeply into the process outlined above.

The first line of index.php includes the includes/bootstrap.inc file, but it also executes code towards the end of bootstrap.inc. First, it destroys any previous variable named $conf. Next, it calls conf_init(). This function allows Drupal to use site-specific configuration files if it finds them. It returns the name of the site-specific configuration file; if no site-specific configuration file is found, sets the variable $config equal to the string 'conf'. Next, it includes the named configuration file. Thus, in the default case it will include 'conf.php'. The code in conf_init would be easier to understand if the variable $file were instead called $potential_filename. Likewise $conf_filename would be a better choice than $config.

The selected configuration file (normally /includes/conf.php) is now parsed, setting the $db_url variable, the optional $db_prefix variable, the $base_url for the website, and the $languages array (default is "en"=>"english").

The database.inc file is now parsed, with the primary goal of initializing a connection to the database. If MySQL is being used, the database.mysql.inc files is brought in; if Postgres is being used, the pear database abstraction layer is used. Although the global variables $db_prefix, $db_type, and $db_url are set, the most useful result of parsing database.inc is a global variable called $active_db which contains the database connection handle.

Now that the database connection is set up, it's time to start a session by including the includes/session.inc file. Oddly, in this include file the executable code is located at the top of the file instead of the bottom. What the code does is to tell PHP to use Drupal's own session storage functions (located in this file) instead of the default PHP session code. A call to PHP's session_start() function thus calls Drupal's sess_open() and sess_read() functions. The sess_read function creates a global $user object and sets the $user->roles array appropriately. Since I am running as an anonymous user, the $user->roles array contains one entry, 1->anonymous user.

We have a database connection, a session has been set up...now it's time to get things set up for modules. The includes/module.inc file is included but no actual code is executed.

The last thing bootstrap.inc does is to set up the global variable $conf, an array of configuration options. It does this by calling the variable_init() function. If a per-site configuration file exists and has already populated the $conf variable, this populated array is passed in to variable_init(). Otherwise, the $conf variable is null and an empty array is passed in. In both cases, a populated array of name-value pairs is returned and assigned to the global $conf variable, where it will live for the duration of this request. It should be noted that name-value pairs in the per-site configuration file have precedence over name-value pairs retrieved from the "variable" table by variable_init().

We're done with bootstrap.inc! Now it's time to go back to index.php and call drupal_page_header(). This function has two responsibilities. First, it starts a timer if $conf['dev_timer'] is set; that is, if you are keeping track of page execution times. Second, if caching has been enabled it retrieves the cached page, calls module_invoke_all() for the 'init' and 'exit' hooks, and exits. If caching is not enabled or the page is not being served to an anonymous user (or several other special cases, like when feedback needs to be sent to a user), it simply exits and returns control to index.php.

Back at index.php, we find an include statement for common.php. This file is chock-full of miscellaneous utility goodness, all kept in one file for performance reasons. But in addition to putting all these utility functions into our namespace, common.php includes some files on its own. They include theme.inc, for theme support; pager.inc for paging through large datasets (it has nothing to do with calling your pager); and menu.inc. In menu.inc, many constants are defined that are used later by the menu system.

The next inclusion that common.inc makes is xmlrpc.inc, with all sorts of functions for dealing with XML-RPC calls. Although one would expect a quick check of whether or not this request is actually an XML-RPC call, no such check is done here. Instead, over 30 variable assignments are made, apparently so that if this request turns to actually be an XML-RPC call, they will be ready. An xmlrpc_init() function instead may help performance here?

A small tablesort.inc file is included as well, containing functions that help behind the scenes with sortable tables. Given the paucity of code here, a performance boost could be gained by moving these into common.inc itself.

The last include done by common.inc is file.inc, which contains common file handling functions. The constants FILE_DOWNLOADS_PUBLIC = 1 and FILE_DOWNLOADS_PRIVATE = 2 are set here, as well as the FILE_SEPARATOR, which is \ for Windows machines and / for all others.

Finally, with includes finished, common.inc sets PHP's error handler to the error_handler() function in the common.inc file. This error handler creates a watchdog entry to record the error and, if any error reporting is enabled via the error_reporting directive in PHP's configuration file (php.ini), it prints the error message to the screen. Drupal's error_handler() does not use the last parameter $variables, which is an array that points to the active symbol table at the point the error occurred. The comment "// set error handler:" at the end of common.inc is redundant, as it is readily apparent what the function call to set_error_handler() does.

The Content-Type header is now sent to the browser as a hard coded string: "Content-Type: text/html; charset=utf-8".

If you remember that the URL we are serving ends with /~vandyk/drupal/?q=node/1, you'll note that the variable q has been set. Drupal now parses this out and checks for any path aliasing for the value of q. If the value of q is a path alias, Drupal replaces the value of q with the actual path that the value of q is aliased to. This sleight-of-hand happens before any modules see the value of q. Cool.

Module initialization now happens via the module_init() function. This function runs require_once on the admin, filter, system, user and watchdog modules. The filter module defines FILTER_HTML* and FILTER_STYLE* constants while being included. Next, other modules are include_once'd via module_list(). In order to be loaded, a module must (1) be enabled (that is, the status column of the "system" database table must be set to 1), and (2) Drupal's throttle mechanism must determine whether or not the module is eligible for exclusion when load is high. First, it determines whether the module is eligible by looking at the throttle column of the "system" database table; then, if the module is eligible, it looks at $conf["throttle_level"] to see whether the load is high enough to exclude the module. Once all modules have been include_once'd and their names added to the $list local array, the array is sorted by module name and returned. The returned $list is discarded because the module_list() invocation is not part of an assignment (e.g., it is simply module_list() and not $module_list = module_list()). The strategy here is to keep the module list inside a static variable called $list inside the module_list() function. The next time module_list() is called, it will simply return its static variable $list rather than rebuilding the whole array. We see that as we follow the final objective of module_init(); that is, to send all modules the "init" callback.

To see how the callbacks work let's step through the init callback for the first module. First module_invoke_all() is called and passed the string enumerating which callback is to be called. This string could be anything; it is simply a symbol that call modules have agreed to abide by, by convention. In this case it is the string "init".

The module_invoke_all() function now steps through the list of modules it got from calling module_list(). The first one is "admin", so it calls module_invoke("admin","init"). The module_invoke() function simply puts the two together to get the name of the function it will call. In this case the name of the function to call is "admin_init". If a function by this name exists, the function is called and the returned result, if any, ends up in an array called $return which is returned after all modules have been invoked. The lesson learned here is that if you are writing a module and intend to return a value from a callback, you must return it as an array.

Back to common.inc. There is a check for suspicious input data. To find out whether or not the user has permission to bypass this check, user_access() is called. This retrieves the user's permissions and stashes them in a static variable called $perm. Whether or not a user has permission for a given action is determined by a simple substring search for the name of the permission (e.g., "bypass input data check") within the $perm string. Our $perm string, as an anonymous user, is currently "0access content, ". Why the 0 at the beginning of the string? Because $perm is initialized to 0 by user_access().

The actual check for suspicious input data is carried out by valid_input_data() which lives in common.inc. It simply goes through an array it's been handed (in this case the $_REQUEST array) and checks all keys and values for the following "evil" strings: javascript, expression, alert, dynsrc, datasrc, data, lowsrc, applet, script, object, style, embed, form, blink, meta, html, frame, iframe, layer, ilayer, head, frameset, xml. If any of these are matched watchdog records a warning and Drupal dies (in the PHP sense). I wondered why both the keys and values of the $_REQUEST array are examined. This seems very time-consuming. Also, would it die if my URL ended with "/?xml=true" or "/?format=xml"?

The next step in common.inc's executable code is a call to locale_init() to set up locale data. If the user is not an anonymous user and has a language preference set up, the two-character language key is returned; otherwise, the key of the single-entry global array $language is returned. In our case, that's "en".

The last gasp of common.inc is to call init_theme(). You'd think that for consistency this would be called theme_init() (of course, that would be a namespace clash with a callback of the same name). This finds out which themes are available, which the user has selected, and then include_once's the chosen theme. If the user's selected theme is not available, the value at $conf["theme_default"] is used. In our case, we are an anonymous user with no theme selected, so the default xtemplate theme is used. Thus, the file themes/xtemplate/xtemplate.theme is include_once'd. The inclusion of xtemplate.theme calls include_once("themes/xtemplate/xtemplate.inc", creates a new object called xtemplate as a global variable. Inside this object is an xtemplate object called "template" with lots of attributes. Then there is a nonfunctional line where SetNullBlock is called. A comment indicates that someone is aware that this doesn't work.

Now we're back to index.php! A call to fix_gpc_magic() is in order. The "gpc" stands for Get, Post, Cookie: the three places that unescaped quotes may be found. If deemed necessary by the status of the boolean magic_quotes_gpc directive in PHP's configuration file (php.ini), slashes will be stripped from $_GET, $_POST, $_COOKIE, and $_REQUEST arrays. It seems odd that the function is not called fix_gpc_magic_quotes, since it is the "magic quotes" that are being fixed, not the magic. In my distribution of PHP, the magic_quotes_gpc directive is set to "Off", so slashes do not need to be stripped.

The next step is to set up menus. I'm not sure why we're setting up menus for an anonymous user, but let's go ahead and follow the logic anyway. We jump to menu_execute_active_handler() in menu.inc. This sets up a $_menu array consisting of items, local tasks, path index, and visible arrays. Then the system realizes that we're not going to be building any menus for an anonymous user and bows out. The real meat of the node creation and formatting happens here, but is complex enough for a separate commentary. Back in index.php, the switch statement doesn't match either case and we approach the last call in the file, to drupal_page_footer in common.inc. This takes care of caching the page we've built if caching is enabled (it's not) and calls module_invoke_all() with the "exit" callback symbol.

Although you may think we're done, PHP's session handler still needs to tidy up. It calls sess_write() in session.inc to update the session database table, then sess_close() which simply returns 1.

We're done.

Frustrations with Drupal 4.4

Rule #1 when using the latest and greatest software: don't.

I upgraded this site to the latest and greatest Drupal at some point and totally messed up the site. First my green theme didn't work, then all the nodes died because I invoked some filters. As near as I can tell, the filter.module can't be working at all.

Sigh.

Rule #2 when using the latest and greatest software: See Rule #1.

Bah.

Tutorial on writing modules.

This is my first tutorial written for the Drupal site. It took me over a day to do, but it was a good learning experience. I can't say it's perfect, but at least fewer people will have to suffer through my pain after this gets through the moderation queue on the site. And, frustratingly enough, it looks like crap on my site. Argh.

Introduction

This tutorial describes how to create a module for Drupal-CVS (i.e. Drupal version > 4.3.1). A module is a collection of functions that link into Drupal, providing additional functionality to your Drupal installation. After reading this tutorial, you will be able to create a basic block module and use it as a template for more advanced modules and node modules.

This tutorial will not necessarily prepare you to write modules for release into the wild. It does not cover caching, nor does it elaborate on permissions or security issues. Use this tutorial as a starting point, and review other modules and the Drupal handbook and Coding standards for more information.

This tutorial assumes the following about you:

  • Basic PHP knowledge, including syntax and the concept of PHP objects
  • Basic understanding of database tables, fields, records and SQL statements
  • A working Drupal installation
  • Drupal administration access and webserver access

This tutorial does not assume you have any knowledge about the inner workings of a Drupal module. This tutorial will not help you write modules for Drupal 4.3.1 or before.

Getting Started

To focus this tutorial, we'll start by creating a block module that lists links to content such as blog entries or forum discussions that were created one week ago. The full tutorial will teach us how to create block content, write links, and retrieve information from Drupal nodes.

Start your module by creating a PHP file and save it as 'onthisdate.module'.

<?php

?>

As per the Coding standards, use the longhand <?php tag, and not <? to enclose your PHP code.

All functions in your module are named {modulename)_{hook}, where "hook" is a well defined function name. Drupal will call these functions to get specific data, so having these well defined names means Drupal knows where to look.

Telling Drupal about your module

The first function we'll write will tell Drupal information about your module: its name and description. The hook name for this function is 'help', so start with the onthisdate_help function:

function onthisdate_help($section) {

}

The $section variable provides context for the help: where in Drupal or the module are we looking for help. The recommended way to process this variable is with a switch statement. You'll see this code pattern in other modules.

/* Commented out until bug fixed */
/*
function onthisdate_help($section) {
  switch($section) {
    case "admin/system/modules#name":
      return "onthisdate";
      break;
    case "admin/system/modules#description":
      return t("Display a list of nodes that were created a week ago.");
      break;
  }
}
*/

You will eventually want to add other cases to this switch statement to provide real help messages to the user. In particular, output for "admin/help#onthisdate" will display on the main help page accessed by the admin/help URL for this module (/admin/help or ?q=admin/help).

The t() function in the second case is used to provide localized content to the user. Any string that presents information to the user should be enclosed in at t() call so that it can be later translated.

Note:This function is commented out in the above code. This is on purpose, as the current version of Drupal CVS won't display the module name, and won't enable it properly when installed. Until this bug is fixed, comment out your help function, or your module may not work.

Telling Drupal who can use your module

The next function to write is the permissions function. Here, you can tell Drupal who can access your module. At this point, give permission to anyone who can access site content or administrate the module.

function onthisdate_perm() {
  return array("administer onthisdate");
}

If you are going to write a module that needs to have finer control over the permissions, and you're going to do permission control, you may want to define a new permission set. You can do this by adding strings to the array that is returned:

function onthisdate_perm() {
  return array("access onthisdate", "administer onthisdate");
}

You'll need to adjust who has permission to view your module on the administer » accounts » permissions page. We'll use the user_access() function to check access permissions later.

Be sure your permission strings must be unique to your module. If they are not, the permissions page will list the same permission multiple times.

Announce we have block content

There are several types of modules: block modules and node modules are two. Block modules create abbreviated content that is typically (but not always, and not required to be) displayed along the left or right side of a page. Node modules generate full page content (such as blog, forum, or book pages).

We'll create a block content to start, and later discuss node content. A module can generate content for blocks and also for a full page (the blogs module is a good example of this). The hook for a block module is appropriately called "block", so let's start our next function:

function onthisdate_block($op='list', $delta=0) {
  
}

The block function takes two parameters: the operation and the offset, or delta. We'll just worry about the operation at this point. In particular, we care about the specific case where the block is being listed in the blocks page. In all other situations, we'll display the block content.

function onthisdate_block($op='list', $delta=0) {

  // listing of blocks, such as on the admin/system/block page
  if ($op == "list") {
    $block[0]["info"] = t("On This Date");
    return $block;
  } else {
  // our block content
  }
}

Generate content for a block

Now, we need to generate the 'onthisdate' content for the block. In here, we'll demonstrate a basic way to access the database.

Our goal is to get a list of content (stored as "nodes" in the database) created a week ago. Specifically, we want the content created between midnight and 11:59pm on the day one week ago. When a node is first created, the time of creation is stored in the database. We'll use this database field to find our data.

First, we need to calculate the time (in seconds since epoch start, see http://www.php.net/manual/en/function.time.php for more information on time format) for midnight a week ago, and 11:59pm a week ago. This part of the code is Drupal independent, see the PHP website (http://php.net/) for more details.

function onthisdate_block($op='list', $delta=0) {

  // listing of blocks, such as on the admin/system/block page
  if ($op == "list") {
    $block[0]["info"] = t("On This Date");
    return $block;
  } else {
  // our block content

    // Get today's date
    $today = getdate();

    // calculate midnight one week ago
    $start_time = mktime(0, 0, 0, 
                         $today['mon'], ($today['mday'] - 7), $today['year']);

    // we want items that occur only on the day in question, so calculate 1 day
    $end_time = $start_time + 86400;  // 60 * 60 * 24 = 86400 seconds in a day
    ...
  }
}

The next step is the SQL statement that will retrieve the content we'd like to display from the database. We're selecting content from the node table, which is the central table for Drupal content. We'll get all sorts of content type with this query: blog entries, forum posts, etc. For this tutorial, this is okay. For a real module, you would adjust the SQL statement to select specific types of content (by adding the 'type' column and a WHERE clause checking the 'type' column).

Note: the table name is enclosed in curly braces: {node}. This is necessary so that your module will support database table name prefixes. You can find more information on the Drupal website by reading the Table Prefix (and sharing tables across instances) page in the Drupal handbook.

  $query = "SELECT nid, title, created FROM {node} WHERE created >= %d AND created <= %d", $start_time, $end_time);

Drupal uses database helper functions to perform database queries. This means that, for the most part, you can write your database SQL statement and not worry about the backend connections.

We'll use db_query() to get the records (i.e. the database rows) that match our SQL query, and db_fetch_object() to look at the individual records:


  // get the links
  $queryResult =  db_query($query);

  // content variable that will be returned for display
  $block_content = '';

  while ($links = db_fetch_object($queryResult)) {
    $block_content .= '<a href="' . url('node/view/' . $links->nid ) . '">' . 
                       $links->title . '</a><br />';
  }

  // check to see if there was any content before setting up the block
  if ($block_content == '') {
    /* No content from a week ago.  If we return nothing, the block 
     * doesn't show, which is what we want. */
    return;
  }

  // set up the block
  $block['subject'] = 'On This Date';
  $block['content'] = $block_content;
  return $block;
}

Notice the actual URL is enclosed in the url() function. This adjusts the URL to the installations URL configuration of either clean URLS: http://sitename/node/view/2 or http://sitename/?q=node/view/2

Also, we return an array that has 'subject' and 'content' elements. This is what Drupal expects from a block function. If you do not include both of these, the block will not render properly.

You may also notice the bad coding practice of combining content with layout. If you are writing a module for others to use, you will want to provide an easy way for others (in particular, non-programmers) to adjust the content's layout. An easy way to do this is to include a class attribute in your link, and not necessarily include the <br /> at the end of the link. Let's ignore this for now, but be aware of this issue when writing modules that others will use.

Putting it all together, our block function looks like this:

function onthisdate_block($op='list', $delta=0) {

  // listing of blocks, such as on the admin/system/block page
  if ($op == "list") {
    $block[0]["info"] = t("On This Date");
    return $block;
  } else {
  // our block content

    // content variable that will be returned for display
    $block_content = '';

    // Get today's date
    $today = getdate();

    // calculate midnight one week ago
    $start_time = mktime(0, 0, 0, 
                         $today['mon'], ($today['mday'] - 7), $today['year']);

    // we want items that occur only on the day in question, so calculate 1 day
    $end_time = $start_time + 86400;  // 60 * 60 * 24 = 86400 seconds in a day

    $query = "SELECT nid, title, created FROM {node} WHERE created >= %d AND created <= %d", $start_time, $end_time);

    // get the links
    $queryResult =  db_query($query);

    while ($links = db_fetch_object($queryResult)) {
      $block_content .= '<a href="'.url('node/view/'.$links->nid).'">'. 
                        $links->title . '</a><br />';
    }

    // check to see if there was any content before setting up the block
    if ($block_content == '') {
      // no content from a week ago, return nothing.
      return;
    }

    // set up the block
    $block['subject'] = 'On This Date';
    $block['content'] = $block_content;
    return $block;
  }
}

Installing, enabling and testing the module

At this point, you can install your module and it'll work. Let's do that, and see where we need to improve the module.

To install the module, you'll need to copy your onthisdate.module file to the modules directory of your Drupal installation. The file must be installed in this directory or a subdirectory of the modules directory, and must have the .module name extension.

Log in as your site administrator, and navigate to the modules administration page to get an alphabetical list of modules. In the menus: administer » configuration » modules, or via URL:

    http://.../admin/system/modules or http://.../?q=admin/system/modules

Note: You'll see one of three things for the 'onthisdate' module at this point:

  • You'll see the 'onthisdate' module name and no description
  • You'll see no module name, but the 'onthisdate' description
  • You'll see both the module name and the description

Which of these three choices you see is dependent on the state of the CVS tree, your installation and the help function in your module. If you have a description and no module name, and this bothers you, comment out the help function for the moment. You'll then have the module name, but no description. For this tutorial, either is okay, as you will just enable the module, and won't use the help system.

Enable the module by selecting the checkbox and save your configuration.

Because the module is a blocks module, we'll need to also enable it in the blocks administration menu and specify a location for it to display. Navigate to the blocks administration page: admin/system/block or administer » configuration » blocks in the menus.

Enable the module by selecting the enabled checkbox for the 'On This Date' block and save your blocks. Be sure to adjust the location (left/right) if you are using a theme that limits where blocks are displayed.

Now, head to another page, say select the module. In some themes, the blocks are displayed after the page has rendered the content, and you won't see the change until you go to new page.

If you have content that was created a week ago, the block will display with links to the content. If you don't have content, you'll need to fake some data. You can do this by creating a blog, forum topic or book page, and adjust the "Authored on:" date to be a week ago.

Alternately, if your site has been around for a while, you may have a lot of content created on the day one week ago, and you'll see a large number of links in the block.

Create a module configuration (settings) page

Now that we have a working module, we'd like to make it better. If we have a site that has been around for a while, content from a week ago might not be as interesting as content from a year ago. Similarly, if we have a busy site, we might not want to display all the links to content created last week. So, let's create a configuration page for the administrator to adjust this information.

The configuration page uses the 'settings' hook. We would like only administrators to be able to access this page, so we'll do our first permissions check of the module here:

function onthisdate_settings() {
  // only administrators can access this module
  if (!user_access("admin onthisdate")) {
    return message_access();
  }
}

If you want to tie your modules permissions to the permissions of another module, you can use that module's permission string. The "access content" permission is a good one to check if the user can view the content on your site:

  ... 
  // check the user has content access
  if (!user_access("access content")) {
    return message_access();
  }
  ...

We'd like to configure how many links display in the block, so we'll create a form for the administrator to set the number of links:

function onthisdate_settings() {
  // only administrators can access this module
  if (!user_access("admin onthisdate")) {
    return message_access();
  }

  $output .= form_textfield(t("Maximum number of links"), "onthisdate_maxdisp",
             variable_get("onthisdate_maxdisp", "3"), 2, 2,
             t("The maximum number of links to display in the block."));

  return $output;
}

This function uses several powerful Drupal form handling features. We don't need to worry about creating an HTML text field or the form, as Drupal will do so for us. We use variable_get to retrieve the value of the system configuration variable "onthisdate_maxdisp", which has a default value of 3. We use the form_textfield function to create the form and a text box of size 2, accepting a maximum length of 2 characters. We also use the translate function of t(). There are other form functions that will automatically create the HTML form elements for use. For now, we'll just use the form_textfield function.

Of course, we'll need to use the configuration value in our SQL SELECT. Because different databases have slightly different ways of limiting the amount of data returned, Drupal provides a database independent function to query the database: db_query_range. Get the saved maximum number and use db_query_range():

  $limitnum = variable_get("onthisdate_maxdisp", 3);

  $query = "SELECT nid, title, created FROM {node} WHERE created >= %d AND created <= %d", $start_time, $end_time);

  // get the links, limited to just the maxium number:
  $queryResult =  db_query($query, 0, $limitnum);

You can test the settings page by editing the number of links displayed and noticing the block content adjusts accordingly.

Navigate to the settings page: admin/system/modules/onthisdate or administer » configuration » modules » onthisdate. Adjust the number of links and save the configuration. Notice the number of links in the block adjusts accordingly.

Note:We don't have any validation with this input. If you enter "c" in the maximum number of links, you'll break the block.

Adding menu links and creating page content

So far we have our working block and a settings page. The block displays a maximum number of links. However, there may be more links than the maximum we show. So, let's create a page that lists all the content that was created a week ago.

function onthisdate_all() {
  
}

We're going to use much of the code from the block function. We'll write this ExtremeProgramming style, and duplicate the code. If we need to use it in a third place, we'll refactor it into a separate function. For now, copy the code to the new function onthisdate_all(). Contrary to all our other functions, 'all', in this case, is not a Drupal hook. We'll discuss below.

function onthisdate_all() {

  // content variable that will be returned for display
  $page_content = '';

  // Get today's date
  $today = getdate();

  // calculate midnight one week ago
  $start_time = mktime(0, 0, 0, 
                       $today['mon'], ($today['mday'] - 7), $today['year']);

  // we want items that occur only on the day in question, so calculate 1 day
  $end_time = $start_time + 86400;  // 60 * 60 * 24 = 86400 seconds in a day

  // NOTE!  No LIMIT clause here!  We want to show all the code
  $query = "SELECT nid, title, created FROM " . 
           "{node} WHERE created >= '" . $start_time . 
           "' AND created <= '". $end_time . "'";

  // get the links
  $queryResult =  db_query($query);

  while ($links = db_fetch_object($queryResult)) {
    $page_content .= '<a href="'.url('node/view/'.$links->nid).'">'. 
                      $links->title . '</a><br />';
  }
  
  ...
}

We have the page content at this point, but we want to do a little more with it than just return it. When creating pages, we need to send the page content to the theme for proper rendering. We use this with the theme() function. Themes control the look of a site. As noted above, we're including layout in the code. This is bad, and should be avoided. It is, however, the topic of another tutorial, so for now, we'll include the formatting in our content:

    print theme("page", $content_string);

The rest of our function checks to see if there is content and lets the user know. This is preferable to showing an empty or blank page, which may confuse the user.

Note that we are responsible for outputting the page content with the 'print theme()' syntax. This is a change from previous 4.3.x themes.

function onthisdate_all() {

  ...

  // check to see if there was any content before setting up the block
  if ($page_content == '') {
    // no content from a week ago, let the user know 
    print theme("page", 
                "No events occurred on this site on this date in history.");
    return;
  }

  print theme("page", $page_content);
}

Letting Drupal know about the new function

As mentioned above, the function we just wrote isn't a 'hook': it's not a Drupal recognized name. We need to tell Drupal how to access the function when displaying a page. We do this with the _link hook and the menu() function:

function onthisdate_link($type, $node=0) {

}

There are many different types, but we're going to use only 'system' in this tutorial.

function onthisdate_link($type, $node=0) {
  if (($type == "system")) { 
    // URL, page title, func called for page content, arg, 1 = don't disp menu
    menu("onthisdate", t("On This Date"), "onthisdate_all", 1, 1);
  }
}

Basically, we're saying if the user goes to "onthisdate" (either via ?q=onthisdate or http://.../onthisdate), the content generated by onthisdate_all will be displayed. The title of the page will be "On This Date". The final "1" in the arguments tells Drupal to not display the link in the user's menu. Make this "0" if you want the user to see the link in the side navigation block.

Navigate to /onthisdate (or ?q=onthisdate) and see what you get.

Adding a more link and showing all entries

Because we have our function that creates a page with all the content created a week ago, we can link to it from the block with a "more" link.

Add these lines just before that $block['subject'] line, adding this to the $block_content variable before saving it to the $block['content'] variable:

  // add a more link to our page that displays all the links
   $block_content .= "<div class=\"more-link\">". l(t("more"), "onthisdate", array("title" => t("More events on this day."))) ."</div>";

This will add the more link.

And we're done!

We now have a working module. It created a block and a page. You should now have enough to get started writing your own modules. We recommend you start with a block module of your own and move onto a node module. Alternately, you can write a filter or theme.

Please see the Drupal Handbook for more information.

Further Notes

As is, this tutorial's module isn't very useful. However, with a few enhancements, it can be entertaining. Try modifying the select query statement to select only nodes of type 'blog' and see what you get. Alternately, you could get only a particular user's content for a specific week. Instead of using the block function, consider expanding the menu and page functions, adding menus to specific entries or dates, or using the menu callback arguments to adjust what year you look at the content from.

If you start writing modules for others to use, you'll want to provide more details in your code. Comments in the code are incredibly valuable for other developers and users in understanding what's going on in your module. You'll also want to expand the help function, providing better help for the user. Follow the Drupal Coding standards, especially if you're going to add your module to the project.

Two topics very important in module development are writing themeable pages and writing translatable content. We touched briefly on both of these topics with the theme() and t() calls in various parts of the module. Please check the Drupal Handbook for more details on these two subject.

Published my first tutorial.

Blog

I made my first contribution to the Drupal project today. I wrote a tutorial (also published on my site) and published it just now. It took me two days to do. I was pretty nervous about publishing such a lengthy first article/page as my first contribution, but I received some good feedback, so I think it'll be fine.

Publishing the tutorial on my site meant I needed to figure out some details with my site. I had the system stripping HTML tags. Unfortuately, I was stripping too many tags: the formatting was lost. I've since added the tags, so it looks good here. I'm pretty excited.

There's a lot of configuration details on the site. I'm trying to keep the base code unmodified so that I can update at any point. We'll see how long I can keep from fixing things.

Pages