Version 5 of HOW TO - Remove C style comments

Updated 2009-06-25 15:19:04 by tbtietc

The following procedure will remove C style comments (i.e. /* .... */ ) from text.

 proc removeComments { text {replacement ""} } {
    regsub -all {[/][*].*?[*][/]} $text ${replacement} text
    return $text 
 }

If you need to remove C style comments that are imbedded (i.e. /* ... /* ... */ ... */) use the following procedure.

 proc removeImbeddedComments { text {replacement ""} } {
     set text [string map  {"/*" \x80 "*/" \x81} $text]
     while {[regsub -all {\x80[^\x80\x81]*?\x81} $text ${replacement} text]} {continue}
     set text [string map  {\x80 "/*" \x81 "*/"} $text]
     return $text
 }

Use Examples:

 removeComments ${data} "#comment-removed#"

 removeImbeddedComments ${data} "#comment-removed#"

Test Cases:

 ##### Simple Comments #####
 # test-1
 /**/
 /* */
 /* text1 */
 # test-2
 text1 /**/ text2 /* */ text3 /* comment */
 # test-3
 /*
 */
 text1
 /*
     */
 text2
     /*
      */
 # test-4
 text1 /*
 */ text2

 text1 /*
     */ text2
 # test-5
 /* comment
 */
 /*
 comment
 */
 /*
 comment */
 ##### Imbedded Comments #####
 # test-1
 text1 /*/*/**/*/*/ text2
 # test-2
 text1 /*/**//**//*/**//**//**/*/*/ text2
 # test-3
 text1 /* comment /* comment /* comment */ comment */ comment */ text2
 # test-4
 text1
 /*
 text2
 text3 /* comment */
 text4 /*
         comment
         comment /* comment */
         comment
       */
 text5
 */
 text5
 # test-5
 text1
 /*
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  */
 text2
 # test-6
 text1 * / / *
 /*
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  */
 text2
 # test-7 (dangling comments)
 */ /*

Test results from the removeImbeddedComments procedure were as follows.

 ##### Simple Comments #####
 # test-1
 #comment-removed#
 #comment-removed#
 #comment-removed#
 # test-2
 text1 #comment-removed# text2 #comment-removed# text3 #comment-removed#
 # test-3
 #comment-removed#
 text1
 #comment-removed#
 text2
     #comment-removed#
 # test-4
 text1 #comment-removed# text2

 text1 #comment-removed# text2
 # test-5
 #comment-removed#
 #comment-removed#
 #comment-removed#
 ##### Imbedded Comments #####
 # test-1
 text1 #comment-removed# text2
 # test-2
 text1 #comment-removed# text2
 # test-3
 text1 #comment-removed# text2
 # test-4
 text1
 #comment-removed#
 text5
 # test-5
 text1
 #comment-removed#
 text2
 # test-6
 text1 * / / *
 #comment-removed#
 text2
 # test-7 (dangling comments)
 */ /*

Tom Krehbiel


Pierre Coueffin (03 Sept. 2005): You do have to be careful if you try to use this on actual comments in C code.

if 0 {

 removeComments {printf ("/* %s */\n", "Comment to print"); /* Prints a comment to stdout */}

returns:

 printf (" \n ", "Comment to print");

where you might expect to see:

 printf ("/* %s */\n", "Comment to print");

}


tbtietc - 2009-06-25 11:19:05

<enter your comment here, a header with nick-name and timestamp will be insert for you> regsub -all {('(^\'|\\.)')|("(^\"|\\.)*")|(//^\n*)|(/\*(^*|*^/)*\*/)} $text "\\1\\3" text;

This detects: A. Character in single quotes B. String in double quotes C. C style comments.

And replaces: A, B with themselves (quotes intact). C with null-string (comments deleted).

For example, given text as the following C code: // I hope this is going to /* be detected as a */ comment. /* Similarly, this too should be detected // as a comment. */ /* This is a /* comment.*/ /* A quote uses " and a single quote uses ' */ /* This one is a single-line comment */ /* This

 * is a
 * multiple-line
 * comment
 */

/* This one has 2 comments */int ttt; //arranged like this. int main () {

    int /* Comment */ a = 10; ///*comment*/10;
    char *s1 = "http://www.google.com";
    char *s2 = "/* This is a comment. */";
    char *s3 = "A quote is this \"";
    char *s4 = "A single-quote is this '";
    char ch  = '"';
    char ch2 = '"';

}

Output: <8 blank lines> int ttt; int main () {

    int  a = 10;
    char *s1 = "http://www.google.com";
    char *s2 = "/* This is a comment. */";
    char *s3 = "A quote is this \"";
    char *s4 = "A single-quote is this '";
    char ch  = '"';
    char ch2 = '"';

}