HOW TO - Remove C style comments

tomk

The following procedure will remove C style comments (i.e. /* .... */ ) from text.

 proc removeComments { text {replacement ""} } {
    regsub -all {[/][*].*?[*][/]} $text ${replacement} text
    return $text 
 }

If you need to remove C style comments that are imbedded (i.e. /* ... /* ... */ ... */) use the following procedure.

 proc removeImbeddedComments { text {replacement ""} } {
     set text [string map  {"/*" \x80 "*/" \x81} $text]
     while {[regsub -all {\x80[^\x80\x81]*?\x81} $text ${replacement} text]} {continue}
     set text [string map  {\x80 "/*" \x81 "*/"} $text]
     return $text
 }

Use Examples:

 removeComments ${data} "#comment-removed#"

 removeImbeddedComments ${data} "#comment-removed#"

Test Cases:

 ##### Simple Comments #####
 # test-1
 /**/
 /* */
 /* text1 */
 # test-2
 text1 /**/ text2 /* */ text3 /* comment */
 # test-3
 /*
 */
 text1
 /*
     */
 text2
     /*
      */
 # test-4
 text1 /*
 */ text2

 text1 /*
     */ text2
 # test-5
 /* comment
 */
 /*
 comment
 */
 /*
 comment */
 ##### Imbedded Comments #####
 # test-1
 text1 /*/*/**/*/*/ text2
 # test-2
 text1 /*/**//**//*/**//**//**/*/*/ text2
 # test-3
 text1 /* comment /* comment /* comment */ comment */ comment */ text2
 # test-4
 text1
 /*
 text2
 text3 /* comment */
 text4 /*
         comment
         comment /* comment */
         comment
       */
 text5
 */
 text5
 # test-5
 text1
 /*
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  */
 text2
 # test-6
 text1 * / / *
 /*
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  */
 text2
 # test-7 (dangling comments)
 */ /*

Test results from the removeImbeddedComments procedure were as follows.

 ##### Simple Comments #####
 # test-1
 #comment-removed#
 #comment-removed#
 #comment-removed#
 # test-2
 text1 #comment-removed# text2 #comment-removed# text3 #comment-removed#
 # test-3
 #comment-removed#
 text1
 #comment-removed#
 text2
     #comment-removed#
 # test-4
 text1 #comment-removed# text2

 text1 #comment-removed# text2
 # test-5
 #comment-removed#
 #comment-removed#
 #comment-removed#
 ##### Imbedded Comments #####
 # test-1
 text1 #comment-removed# text2
 # test-2
 text1 #comment-removed# text2
 # test-3
 text1 #comment-removed# text2
 # test-4
 text1
 #comment-removed#
 text5
 # test-5
 text1
 #comment-removed#
 text2
 # test-6
 text1 * / / *
 #comment-removed#
 text2
 # test-7 (dangling comments)
 */ /*

Pierre Coueffin (03 Sept. 2005): You do have to be careful if you try to use this on actual comments in C code.

if 0 {

 removeComments {printf ("/* %s */\n", "Comment to print"); /* Prints a comment to stdout */}

returns:

 printf (" \n ", "Comment to print");

where you might expect to see:

 printf ("/* %s */\n", "Comment to print");

}


tbtietc - 2009-06-25 11:19:05

regsub -all {('([^\']|[\\].)')|("([^\"]|[\\].)*")|(//[^\n]*)|(/\*([^*]|[*][^/])*\*/)} $text "\\1\\3" text;

This detects:

 A. Character in single quotes 
 B. String in double quotes
 C. C style comments.

And replaces:

 A, B with themselves (quotes intact).
 C with null-string (comments deleted).