用于一系列unicode点PHP的正则表达式


Regular expressions for a range of unicode points PHP

我正在尝试从字符串中剥离除以下字符之外的所有字符:

  • 字母数字字符
  • 美元符号($
  • 下芯(_
  • 代码点U+0080U+FFFF之间的Unicode字符

通过这样做,我得到了前三个条件:

preg_replace('/[^a-zA-Z'd$_]+/', '', $foo);

如何匹配第四个条件?我考虑过使用'X,但必须有比列出65000多个字符更好的方法。

您可以使用:

$foo = preg_replace('/[^'w$'x{0080}-'x{FFFF}]+/u', '', $foo);
  • 'w-相当于[a-zA-Z0-9_]
  • 'x{0080}-'x{FFFF}匹配码点之间的字符U+0080 and U+FFFF`
  • 正则表达式中支持unicode的/u

现代化的答案。

如果只排除代码点U+80-U+FFFF,那将是不明智的假定Unicode范围扩展到U+10FFFF。

如今,它涵盖了许多超过16位bmp范围的字符。

我将向您展示如何在您想要的范围内完成
utf-16或utf-8/32,您可以控制也可以不控制。

UTF-16

 # UTF-16 regex ;   equavelent UTF-8/32 regex   (?!['x{80}-'x{FFFF}])[$'w]
 (?!
      (?:
           ['x{80}-'x{D7FF}'x{E000}-'x{FFFF}] 
        |  
           ['x{D800}-'x{DBFF}] 
           (?! ['x{DC00}-'x{DFFF}] )
        |  
           ['x{DC00}-'x{DFFF}] 
           (?<! ['x{D800}-'x{DBFF}] ['S's] )
      )
 )
 [$'w] 
 # Output --------------------------------
 # 77,905 Unicode characters
 # UTF-16 regex  equivalent (using codepoints)
 (?:
      ['x{24}'x{30}-'x{39}'x{41}-'x{5A}'x{5F}'x{61}-'x{7A}] 
   |  
      (?:
           'x{D800} ['x{DC00}-'x{DC0B}'x{DC0D}-'x{DC26}'x{DC28}-'x{DC3A}'x{DC3C}-'x{DC3D}'x{DC3F}-'x{DC4D}'x{DC50}-'x{DC5D}'x{DC80}-'x{DCFA}'x{DDFD}'x{DE80}-'x{DE9C}'x{DEA0}-'x{DED0}'x{DEE0}'x{DF00}-'x{DF1F}'x{DF2D}-'x{DF40}'x{DF42}-'x{DF49}'x{DF50}-'x{DF7A}'x{DF80}-'x{DF9D}'x{DFA0}-'x{DFC3}'x{DFC8}-'x{DFCF}] 
        |  'x{D801} ['x{DC00}-'x{DC9D}'x{DCA0}-'x{DCA9}'x{DCB0}-'x{DCD3}'x{DCD8}-'x{DCFB}'x{DD00}-'x{DD27}'x{DD30}-'x{DD63}'x{DE00}-'x{DF36}'x{DF40}-'x{DF55}'x{DF60}-'x{DF67}] 
        |  'x{D802} ['x{DC00}-'x{DC05}'x{DC08}'x{DC0A}-'x{DC35}'x{DC37}-'x{DC38}'x{DC3C}'x{DC3F}-'x{DC55}'x{DC60}-'x{DC76}'x{DC80}-'x{DC9E}'x{DCE0}-'x{DCF2}'x{DCF4}-'x{DCF5}'x{DD00}-'x{DD15}'x{DD20}-'x{DD39}'x{DD80}-'x{DDB7}'x{DDBE}-'x{DDBF}'x{DE00}-'x{DE03}'x{DE05}-'x{DE06}'x{DE0C}-'x{DE13}'x{DE15}-'x{DE17}'x{DE19}-'x{DE35}'x{DE38}-'x{DE3A}'x{DE3F}'x{DE60}-'x{DE7C}'x{DE80}-'x{DE9C}'x{DEC0}-'x{DEC7}'x{DEC9}-'x{DEE6}'x{DF00}-'x{DF35}'x{DF40}-'x{DF55}'x{DF60}-'x{DF72}'x{DF80}-'x{DF91}] 
        |  'x{D803} ['x{DC00}-'x{DC48}'x{DC80}-'x{DCB2}'x{DCC0}-'x{DCF2}'x{DD00}-'x{DD27}'x{DD30}-'x{DD39}'x{DF00}-'x{DF1C}'x{DF27}'x{DF30}-'x{DF50}'x{DFE0}-'x{DFF6}] 
        |  'x{D804} ['x{DC01}'x{DC03}-'x{DC46}'x{DC66}-'x{DC6F}'x{DC7F}-'x{DC81}'x{DC83}-'x{DCAF}'x{DCB3}-'x{DCB6}'x{DCB9}-'x{DCBA}'x{DCD0}-'x{DCE8}'x{DCF0}-'x{DCF9}'x{DD00}-'x{DD2B}'x{DD2D}-'x{DD34}'x{DD36}-'x{DD3F}'x{DD44}'x{DD50}-'x{DD73}'x{DD76}'x{DD80}-'x{DD81}'x{DD83}-'x{DDB2}'x{DDB6}-'x{DDBE}'x{DDC1}-'x{DDC4}'x{DDC9}-'x{DDCC}'x{DDD0}-'x{DDDA}'x{DDDC}'x{DE00}-'x{DE11}'x{DE13}-'x{DE2B}'x{DE2F}-'x{DE31}'x{DE34}'x{DE36}-'x{DE37}'x{DE3E}'x{DE80}-'x{DE86}'x{DE88}'x{DE8A}-'x{DE8D}'x{DE8F}-'x{DE9D}'x{DE9F}-'x{DEA8}'x{DEB0}-'x{DEDF}'x{DEE3}-'x{DEEA}'x{DEF0}-'x{DEF9}'x{DF00}-'x{DF01}'x{DF05}-'x{DF0C}'x{DF0F}-'x{DF10}'x{DF13}-'x{DF28}'x{DF2A}-'x{DF30}'x{DF32}-'x{DF33}'x{DF35}-'x{DF39}'x{DF3B}-'x{DF3D}'x{DF40}'x{DF50}'x{DF5D}-'x{DF61}'x{DF66}-'x{DF6C}'x{DF70}-'x{DF74}] 
        |  'x{D805} ['x{DC00}-'x{DC34}'x{DC38}-'x{DC3F}'x{DC42}-'x{DC44}'x{DC46}-'x{DC4A}'x{DC50}-'x{DC59}'x{DC5E}-'x{DC5F}'x{DC80}-'x{DCAF}'x{DCB3}-'x{DCB8}'x{DCBA}'x{DCBF}-'x{DCC0}'x{DCC2}-'x{DCC5}'x{DCC7}'x{DCD0}-'x{DCD9}'x{DD80}-'x{DDAE}'x{DDB2}-'x{DDB5}'x{DDBC}-'x{DDBD}'x{DDBF}-'x{DDC0}'x{DDD8}-'x{DDDD}'x{DE00}-'x{DE2F}'x{DE33}-'x{DE3A}'x{DE3D}'x{DE3F}-'x{DE40}'x{DE44}'x{DE50}-'x{DE59}'x{DE80}-'x{DEAB}'x{DEAD}'x{DEB0}-'x{DEB5}'x{DEB7}-'x{DEB8}'x{DEC0}-'x{DEC9}'x{DF00}-'x{DF1A}'x{DF1D}-'x{DF1F}'x{DF22}-'x{DF25}'x{DF27}-'x{DF2B}'x{DF30}-'x{DF39}] 
        |  'x{D806} ['x{DC00}-'x{DC2B}'x{DC2F}-'x{DC37}'x{DC39}-'x{DC3A}'x{DCA0}-'x{DCE9}'x{DCFF}'x{DDA0}-'x{DDA7}'x{DDAA}-'x{DDD0}'x{DDD4}-'x{DDD7}'x{DDDA}-'x{DDDB}'x{DDE0}-'x{DDE1}'x{DDE3}'x{DE00}-'x{DE38}'x{DE3A}-'x{DE3E}'x{DE47}'x{DE50}-'x{DE56}'x{DE59}-'x{DE96}'x{DE98}-'x{DE99}'x{DE9D}'x{DEC0}-'x{DEF8}] 
        |  'x{D807} ['x{DC00}-'x{DC08}'x{DC0A}-'x{DC2E}'x{DC30}-'x{DC36}'x{DC38}-'x{DC3D}'x{DC3F}-'x{DC40}'x{DC50}-'x{DC59}'x{DC72}-'x{DC8F}'x{DC92}-'x{DCA7}'x{DCAA}-'x{DCB0}'x{DCB2}-'x{DCB3}'x{DCB5}-'x{DCB6}'x{DD00}-'x{DD06}'x{DD08}-'x{DD09}'x{DD0B}-'x{DD36}'x{DD3A}'x{DD3C}-'x{DD3D}'x{DD3F}-'x{DD47}'x{DD50}-'x{DD59}'x{DD60}-'x{DD65}'x{DD67}-'x{DD68}'x{DD6A}-'x{DD89}'x{DD90}-'x{DD91}'x{DD95}'x{DD97}-'x{DD98}'x{DDA0}-'x{DDA9}'x{DEE0}-'x{DEF4}] 
        |  'x{D808} ['x{DC00}-'x{DF99}] 
        |  'x{D809} ['x{DC80}-'x{DD43}] 
        |  'x{D80C} ['x{DC00}-'x{DFFF}] 
        |  'x{D80D} ['x{DC00}-'x{DC2E}] 
        |  'x{D811} ['x{DC00}-'x{DE46}] 
        |  'x{D81A} ['x{DC00}-'x{DE38}'x{DE40}-'x{DE5E}'x{DE60}-'x{DE69}'x{DED0}-'x{DEED}'x{DEF0}-'x{DEF4}'x{DF00}-'x{DF36}'x{DF40}-'x{DF43}'x{DF50}-'x{DF59}'x{DF63}-'x{DF77}'x{DF7D}-'x{DF8F}] 
        |  'x{D81B} ['x{DE40}-'x{DE7F}'x{DF00}-'x{DF4A}'x{DF4F}-'x{DF50}'x{DF8F}-'x{DF9F}'x{DFE0}-'x{DFE1}'x{DFE3}] 
        |  ['x{D81C}-'x{D820}] ['x{DC00}-'x{DFFF}] 
        |  'x{D821} ['x{DC00}-'x{DFF7}] 
        |  'x{D822} ['x{DC00}-'x{DEF2}] 
        |  'x{D82C} ['x{DC00}-'x{DD1E}'x{DD50}-'x{DD52}'x{DD64}-'x{DD67}'x{DD70}-'x{DEFB}] 
        |  'x{D82F} ['x{DC00}-'x{DC6A}'x{DC70}-'x{DC7C}'x{DC80}-'x{DC88}'x{DC90}-'x{DC99}'x{DC9D}-'x{DC9E}] 
        |  'x{D834} ['x{DD67}-'x{DD69}'x{DD7B}-'x{DD82}'x{DD85}-'x{DD8B}'x{DDAA}-'x{DDAD}'x{DE42}-'x{DE44}] 
        |  'x{D835} ['x{DC00}-'x{DC54}'x{DC56}-'x{DC9C}'x{DC9E}-'x{DC9F}'x{DCA2}'x{DCA5}-'x{DCA6}'x{DCA9}-'x{DCAC}'x{DCAE}-'x{DCB9}'x{DCBB}'x{DCBD}-'x{DCC3}'x{DCC5}-'x{DD05}'x{DD07}-'x{DD0A}'x{DD0D}-'x{DD14}'x{DD16}-'x{DD1C}'x{DD1E}-'x{DD39}'x{DD3B}-'x{DD3E}'x{DD40}-'x{DD44}'x{DD46}'x{DD4A}-'x{DD50}'x{DD52}-'x{DEA5}'x{DEA8}-'x{DEC0}'x{DEC2}-'x{DEDA}'x{DEDC}-'x{DEFA}'x{DEFC}-'x{DF14}'x{DF16}-'x{DF34}'x{DF36}-'x{DF4E}'x{DF50}-'x{DF6E}'x{DF70}-'x{DF88}'x{DF8A}-'x{DFA8}'x{DFAA}-'x{DFC2}'x{DFC4}-'x{DFCB}'x{DFCE}-'x{DFFF}] 
        |  'x{D836} ['x{DE00}-'x{DE36}'x{DE3B}-'x{DE6C}'x{DE75}'x{DE84}'x{DE9B}-'x{DE9F}'x{DEA1}-'x{DEAF}] 
        |  'x{D838} ['x{DC00}-'x{DC06}'x{DC08}-'x{DC18}'x{DC1B}-'x{DC21}'x{DC23}-'x{DC24}'x{DC26}-'x{DC2A}'x{DD00}-'x{DD2C}'x{DD30}-'x{DD3D}'x{DD40}-'x{DD49}'x{DD4E}'x{DEC0}-'x{DEF9}] 
        |  'x{D83A} ['x{DC00}-'x{DCC4}'x{DCD0}-'x{DCD6}'x{DD00}-'x{DD4B}'x{DD50}-'x{DD59}] 
        |  'x{D83B} ['x{DE00}-'x{DE03}'x{DE05}-'x{DE1F}'x{DE21}-'x{DE22}'x{DE24}'x{DE27}'x{DE29}-'x{DE32}'x{DE34}-'x{DE37}'x{DE39}'x{DE3B}'x{DE42}'x{DE47}'x{DE49}'x{DE4B}'x{DE4D}-'x{DE4F}'x{DE51}-'x{DE52}'x{DE54}'x{DE57}'x{DE59}'x{DE5B}'x{DE5D}'x{DE5F}'x{DE61}-'x{DE62}'x{DE64}'x{DE67}-'x{DE6A}'x{DE6C}-'x{DE72}'x{DE74}-'x{DE77}'x{DE79}-'x{DE7C}'x{DE7E}'x{DE80}-'x{DE89}'x{DE8B}-'x{DE9B}'x{DEA1}-'x{DEA3}'x{DEA5}-'x{DEA9}'x{DEAB}-'x{DEBB}] 
        |  ['x{D840}-'x{D868}] ['x{DC00}-'x{DFFF}] 
        |  'x{D869} ['x{DC00}-'x{DED6}'x{DF00}-'x{DFFF}] 
        |  ['x{D86A}-'x{D86C}] ['x{DC00}-'x{DFFF}] 
        |  'x{D86D} ['x{DC00}-'x{DF34}'x{DF40}-'x{DFFF}] 
        |  'x{D86E} ['x{DC00}-'x{DC1D}'x{DC20}-'x{DFFF}] 
        |  ['x{D86F}-'x{D872}] ['x{DC00}-'x{DFFF}] 
        |  'x{D873} ['x{DC00}-'x{DEA1}'x{DEB0}-'x{DFFF}] 
        |  ['x{D874}-'x{D879}] ['x{DC00}-'x{DFFF}] 
        |  'x{D87A} ['x{DC00}-'x{DFE0}] 
        |  'x{D87E} ['x{DC00}-'x{DE1D}] 
        |  'x{DB40} ['x{DD00}-'x{DDEF}] 
      )
 )

UTF-8/32

 # UTF-8/32 regex ;
 (?! ['x{80}-'x{FFFF}] )
 [$'w] 

 # Output --------------------------------
 # 77,905 Unicode characters
 # UTF-8 / 32 regex equivalent (using codepoints)
 (?:
      ['x{24}'x{30}-'x{39}'x{41}-'x{5A}'x{5F}'x{61}-'x{7A}'x{10000}-'x{1000B}'x{1000D}-'x{10026}'x{10028}-'x{1003A}'x{1003C}-'x{1003D}'x{1003F}-'x{1004D}'x{10050}-'x{1005D}'x{10080}-'x{100FA}'x{101FD}'x{10280}-'x{1029C}'x{102A0}-'x{102D0}'x{102E0}'x{10300}-'x{1031F}'x{1032D}-'x{10340}'x{10342}-'x{10349}'x{10350}-'x{1037A}'x{10380}-'x{1039D}'x{103A0}-'x{103C3}'x{103C8}-'x{103CF}'x{10400}-'x{1049D}'x{104A0}-'x{104A9}'x{104B0}-'x{104D3}'x{104D8}-'x{104FB}'x{10500}-'x{10527}'x{10530}-'x{10563}'x{10600}-'x{10736}'x{10740}-'x{10755}'x{10760}-'x{10767}'x{10800}-'x{10805}'x{10808}'x{1080A}-'x{10835}'x{10837}-'x{10838}'x{1083C}'x{1083F}-'x{10855}'x{10860}-'x{10876}'x{10880}-'x{1089E}'x{108E0}-'x{108F2}'x{108F4}-'x{108F5}'x{10900}-'x{10915}'x{10920}-'x{10939}'x{10980}-'x{109B7}'x{109BE}-'x{109BF}'x{10A00}-'x{10A03}'x{10A05}-'x{10A06}'x{10A0C}-'x{10A13}'x{10A15}-'x{10A17}'x{10A19}-'x{10A35}'x{10A38}-'x{10A3A}'x{10A3F}'x{10A60}-'x{10A7C}'x{10A80}-'x{10A9C}'x{10AC0}-'x{10AC7}'x{10AC9}-'x{10AE6}'x{10B00}-'x{10B35}'x{10B40}-'x{10B55}'x{10B60}-'x{10B72}'x{10B80}-'x{10B91}'x{10C00}-'x{10C48}'x{10C80}-'x{10CB2}'x{10CC0}-'x{10CF2}'x{10D00}-'x{10D27}'x{10D30}-'x{10D39}'x{10F00}-'x{10F1C}'x{10F27}'x{10F30}-'x{10F50}'x{10FE0}-'x{10FF6}'x{11001}'x{11003}-'x{11046}'x{11066}-'x{1106F}'x{1107F}-'x{11081}'x{11083}-'x{110AF}'x{110B3}-'x{110B6}'x{110B9}-'x{110BA}'x{110D0}-'x{110E8}'x{110F0}-'x{110F9}'x{11100}-'x{1112B}'x{1112D}-'x{11134}'x{11136}-'x{1113F}'x{11144}'x{11150}-'x{11173}'x{11176}'x{11180}-'x{11181}'x{11183}-'x{111B2}'x{111B6}-'x{111BE}'x{111C1}-'x{111C4}'x{111C9}-'x{111CC}'x{111D0}-'x{111DA}'x{111DC}'x{11200}-'x{11211}'x{11213}-'x{1122B}'x{1122F}-'x{11231}'x{11234}'x{11236}-'x{11237}'x{1123E}'x{11280}-'x{11286}'x{11288}'x{1128A}-'x{1128D}'x{1128F}-'x{1129D}'x{1129F}-'x{112A8}'x{112B0}-'x{112DF}'x{112E3}-'x{112EA}'x{112F0}-'x{112F9}'x{11300}-'x{11301}'x{11305}-'x{1130C}'x{1130F}-'x{11310}'x{11313}-'x{11328}'x{1132A}-'x{11330}'x{11332}-'x{11333}'x{11335}-'x{11339}'x{1133B}-'x{1133D}'x{11340}'x{11350}'x{1135D}-'x{11361}'x{11366}-'x{1136C}'x{11370}-'x{11374}'x{11400}-'x{11434}'x{11438}-'x{1143F}'x{11442}-'x{11444}'x{11446}-'x{1144A}'x{11450}-'x{11459}'x{1145E}-'x{1145F}'x{11480}-'x{114AF}'x{114B3}-'x{114B8}'x{114BA}'x{114BF}-'x{114C0}'x{114C2}-'x{114C5}'x{114C7}'x{114D0}-'x{114D9}'x{11580}-'x{115AE}'x{115B2}-'x{115B5}'x{115BC}-'x{115BD}'x{115BF}-'x{115C0}'x{115D8}-'x{115DD}'x{11600}-'x{1162F}'x{11633}-'x{1163A}'x{1163D}'x{1163F}-'x{11640}'x{11644}'x{11650}-'x{11659}'x{11680}-'x{116AB}'x{116AD}'x{116B0}-'x{116B5}'x{116B7}-'x{116B8}'x{116C0}-'x{116C9}'x{11700}-'x{1171A}'x{1171D}-'x{1171F}'x{11722}-'x{11725}'x{11727}-'x{1172B}'x{11730}-'x{11739}'x{11800}-'x{1182B}'x{1182F}-'x{11837}'x{11839}-'x{1183A}'x{118A0}-'x{118E9}'x{118FF}'x{119A0}-'x{119A7}'x{119AA}-'x{119D0}'x{119D4}-'x{119D7}'x{119DA}-'x{119DB}'x{119E0}-'x{119E1}'x{119E3}'x{11A00}-'x{11A38}'x{11A3A}-'x{11A3E}'x{11A47}'x{11A50}-'x{11A56}'x{11A59}-'x{11A96}'x{11A98}-'x{11A99}'x{11A9D}'x{11AC0}-'x{11AF8}'x{11C00}-'x{11C08}'x{11C0A}-'x{11C2E}'x{11C30}-'x{11C36}'x{11C38}-'x{11C3D}'x{11C3F}-'x{11C40}'x{11C50}-'x{11C59}'x{11C72}-'x{11C8F}'x{11C92}-'x{11CA7}'x{11CAA}-'x{11CB0}'x{11CB2}-'x{11CB3}'x{11CB5}-'x{11CB6}'x{11D00}-'x{11D06}'x{11D08}-'x{11D09}'x{11D0B}-'x{11D36}'x{11D3A}'x{11D3C}-'x{11D3D}'x{11D3F}-'x{11D47}'x{11D50}-'x{11D59}'x{11D60}-'x{11D65}'x{11D67}-'x{11D68}'x{11D6A}-'x{11D89}'x{11D90}-'x{11D91}'x{11D95}'x{11D97}-'x{11D98}'x{11DA0}-'x{11DA9}'x{11EE0}-'x{11EF4}'x{12000}-'x{12399}'x{12480}-'x{12543}'x{13000}-'x{1342E}'x{14400}-'x{14646}'x{16800}-'x{16A38}'x{16A40}-'x{16A5E}'x{16A60}-'x{16A69}'x{16AD0}-'x{16AED}'x{16AF0}-'x{16AF4}'x{16B00}-'x{16B36}'x{16B40}-'x{16B43}'x{16B50}-'x{16B59}'x{16B63}-'x{16B77}'x{16B7D}-'x{16B8F}'x{16E40}-'x{16E7F}'x{16F00}-'x{16F4A}'x{16F4F}-'x{16F50}'x{16F8F}-'x{16F9F}'x{16FE0}-'x{16FE1}'x{16FE3}'x{17000}-'x{187F7}'x{18800}-'x{18AF2}'x{1B000}-'x{1B11E}'x{1B150}-'x{1B152}'x{1B164}-'x{1B167}'x{1B170}-'x{1B2FB}'x{1BC00}-'x{1BC6A}'x{1BC70}-'x{1BC7C}'x{1BC80}-'x{1BC88}'x{1BC90}-'x{1BC99}'x{1BC9D}-'x{1BC9E}'x{1D167}-'x{1D169}'x{1D17B}-'x{1D182}'x{1D185}-'x{1D18B}'x{1D1AA}-'x{1D1AD}'x{1D242}-'x{1D244}'x{1D400}-'x{1D454}'x{1D456}-'x{1D49C}'x{1D49E}-'x{1D49F}'x{1D4A2}'x{1D4A5}-'x{1D4A6}'x{1D4A9}-'x{1D4AC}'x{1D4AE}-'x{1D4B9}'x{1D4BB}'x{1D4BD}-'x{1D4C3}'x{1D4C5}-'x{1D505}'x{1D507}-'x{1D50A}'x{1D50D}-'x{1D514}'x{1D516}-'x{1D51C}'x{1D51E}-'x{1D539}'x{1D53B}-'x{1D53E}'x{1D540}-'x{1D544}'x{1D546}'x{1D54A}-'x{1D550}'x{1D552}-'x{1D6A5}'x{1D6A8}-'x{1D6C0}'x{1D6C2}-'x{1D6DA}'x{1D6DC}-'x{1D6FA}'x{1D6FC}-'x{1D714}'x{1D716}-'x{1D734}'x{1D736}-'x{1D74E}'x{1D750}-'x{1D76E}'x{1D770}-'x{1D788}'x{1D78A}-'x{1D7A8}'x{1D7AA}-'x{1D7C2}'x{1D7C4}-'x{1D7CB}'x{1D7CE}-'x{1D7FF}'x{1DA00}-'x{1DA36}'x{1DA3B}-'x{1DA6C}'x{1DA75}'x{1DA84}'x{1DA9B}-'x{1DA9F}'x{1DAA1}-'x{1DAAF}'x{1E000}-'x{1E006}'x{1E008}-'x{1E018}'x{1E01B}-'x{1E021}'x{1E023}-'x{1E024}'x{1E026}-'x{1E02A}'x{1E100}-'x{1E12C}'x{1E130}-'x{1E13D}'x{1E140}-'x{1E149}'x{1E14E}'x{1E2C0}-'x{1E2F9}'x{1E800}-'x{1E8C4}'x{1E8D0}-'x{1E8D6}'x{1E900}-'x{1E94B}'x{1E950}-'x{1E959}'x{1EE00}-'x{1EE03}'x{1EE05}-'x{1EE1F}'x{1EE21}-'x{1EE22}'x{1EE24}'x{1EE27}'x{1EE29}-'x{1EE32}'x{1EE34}-'x{1EE37}'x{1EE39}'x{1EE3B}'x{1EE42}'x{1EE47}'x{1EE49}'x{1EE4B}'x{1EE4D}-'x{1EE4F}'x{1EE51}-'x{1EE52}'x{1EE54}'x{1EE57}'x{1EE59}'x{1EE5B}'x{1EE5D}'x{1EE5F}'x{1EE61}-'x{1EE62}'x{1EE64}'x{1EE67}-'x{1EE6A}'x{1EE6C}-'x{1EE72}'x{1EE74}-'x{1EE77}'x{1EE79}-'x{1EE7C}'x{1EE7E}'x{1EE80}-'x{1EE89}'x{1EE8B}-'x{1EE9B}'x{1EEA1}-'x{1EEA3}'x{1EEA5}-'x{1EEA9}'x{1EEAB}-'x{1EEBB}'x{20000}-'x{2A6D6}'x{2A700}-'x{2B734}'x{2B740}-'x{2B81D}'x{2B820}-'x{2CEA1}'x{2CEB0}-'x{2EBE0}'x{2F800}-'x{2FA1D}'x{E0100}-'x{E01EF}] 
 )

最后,最简单的,将范围扩展到U+10FFFF

 # UTF-8/32 regex ;
 (?! ['x{80}-'x{10FFFF}] )
 [$'w] 

 # Output --------------------------------
 # 64 Unicode characters
 # UTF-8 / 16/ 32 regex equivalent (using codepoints)
 ['x{24}'x{30}-'x{39}'x{41}-'x{5A}'x{5F}'x{61}-'x{7A}] 
 # Codepoint -> character substitution
 [$0-9A-Z_a-z]  

如果您真的想为MySQL默认排序规则(utf8_general_ci)清理字符串,那么仅仅删除表情符号是不够的。utf8_general_ci对应于字符集utf8/utf8mb3,它只支持0x000到0xFFFF(基本多语言平面)的范围。因此,我建议删除任何代码超过0xFFFF的字符(0x10FFFF/16:SPUA-B,我认为是迄今为止已知的最大字符,根据https://en.wikipedia.org/wiki/Plane_(Unicode)

function removeNonBasicMultilingualPlane(string $text): string
{
   return 'preg_replace('/['x{10000}-'x{10FFFF}]/u', '', $text);
}