【CUDA并行编程之八】Cuda实现Kmeans算法

摘要:
本文主要介绍如何使用CUDA并行计算框架编程实现机器学习中的Kmeans算法,Kmeans算法的详细介绍在这里,本文重点在并行实现的过程。free_memory():释放内存空间~KMEANS():析构函数并行的代码主要三个函数:find_nearest_cluster(...)compute_delta(...)euclid_dist_2(...)首先看一下函数euclid_dist_2(...):[cpp]viewplaincopy__host____device__inlinestaticfloateuclid_dist_2{inti;floatans=0;for{ans+=*;}returnans;}这段代码实际上就是并行的计算向量objects[objectId]和clusters[clusterId]之间的距离,即第objectId个数据点到第clusterId个中心点的距离。

本文主要介绍如何使用CUDA并行计算框架编程实现机器学习中的Kmeans算法,Kmeans算法的详细介绍在这里,本文重点在并行实现的过程。

当然还是简单的回顾一下kmeans算法的串行过程:

伪代码:

  1. 创建k个点作为起始质心(经常是随机选择)
  2. 当任意一个点的簇分配结果发生改变时
  3. 对数据集中的每个数据点
  4. 对每个质心
  5. 计算质心与数据点之间的距离
  6. 将数据点分配到距其最近的簇
  7. 对每一个簇,计算簇中所有点的均值并将均值作为质心
我们可以观察到有两个部分可以并行优化:

①line03-04:将每个数据点到多个质心的距离计算进行并行化

②line05:将数据点到某个执行的距离计算进行并行化


KMEANS类:

  1. classKMEANS
  2. {
  3. private:
  4. intnumClusters;
  5. intnumCoords;
  6. intnumObjs;
  7. int*membership;//[numObjs]
  8. char*filename;
  9. float**objects;//[numObjs][numCoords]dataobjects
  10. float**clusters;//[numClusters][unmCoords]clustercenter
  11. floatthreshold;
  12. intloop_iterations;
  13. public:
  14. KMEANS(intk);
  15. voidfile_read(char*fn);
  16. voidfile_write();
  17. voidcuda_kmeans();
  18. inlineintnextPowerOfTwo(intn);
  19. voidfree_memory();
  20. virtual~KMEANS();
  21. };//KMEANS

成员变量:

numClusters:中心点的个数

numCoords:每个数据点的维度

numObjs:数据点的个数

membership:每个数据点所属类别的数组,维度为numObjs

filename:读入的文件名

objects:所有数据点,维度为[numObjs][numCoords]

clusters:中心点数据,维度为[numObjs][numCoords]

threshold:控制循环次数的一个域值

loop_iterations:循环的迭代次数

成员函数:

KMEANS(int k):含参构造函数。初始化成员变量

file_read(char *fn):读入文件数据并初始化object以及membership变量

file_write():将计算结果写回到结果文件中去

cuda_kmeans():kmeans计算的入口函数

nextPowerOfTwo(int n):它计算大于等于输入参数n的第一个2的幂次数。

free_memory():释放内存空间

~KMEANS():析构函数


并行的代码主要三个函数:

find_nearest_cluster(...)

compute_delta(...)

euclid_dist_2(...)


首先看一下函数euclid_dist_2(...):

  1. __host____device__inlinestatic
  2. floateuclid_dist_2(intnumCoords,intnumObjs,intnumClusters,float*objects,float*clusters,intobjectId,intclusterId)
  3. {
  4. inti;
  5. floatans=0;
  6. for(i=0;i<numCoords;i++)
  7. {
  8. ans+=(objects[numObjs*i+objectId]-clusters[numClusters*i+clusterId])*
  9. (objects[numObjs*i+objectId]-clusters[numClusters*i+clusterId]);
  10. }
  11. returnans;
  12. }
这段代码实际上就是并行的计算向量objects[objectId]和clusters[clusterId]之间的距离,即第objectId个数据点到第clusterId个中心点的距离。


再看一下函数compute_delta(...):

  1. /*
  2. *numIntermediates:Theactualnumberofintermediates
  3. *numIntermediates2:Thenextpoweroftwo
  4. */
  5. __global__staticvoidcompute_delta(int*deviceIntermediates,intnumIntermediates,intnumIntermediates2)
  6. {
  7. extern__shared__unsignedintintermediates[];
  8. intermediates[threadIdx.x]=(threadIdx.x<numIntermediates)?deviceIntermediates[threadIdx.x]:0;
  9. __syncthreads();
  10. //numIntermediates2*must*beapoweroftwo!
  11. for(unsignedints=numIntermediates2/2;s>0;s>>=1)
  12. {
  13. if(threadIdx.x<s)
  14. {
  15. intermediates[threadIdx.x]+=intermediates[threadIdx.x+s];
  16. }
  17. __syncthreads();
  18. }
  19. if(threadIdx.x==0)
  20. {
  21. deviceIntermediates[0]=intermediates[0];
  22. }
  23. }
这段代码的意义就是将一个线程块中每个线程的对应的intermediates的数据求和最后放到deviceIntermediates[0]中去然后拷贝回主存块中去。这个问题的更好的解释在这里,实际上就是一个数组求和的问题,应用在这里求得的是有改变的membership中所有数据的和,即改变了簇的点的个数。


最后再看函数finid_nearest_cluster(...):

  1. /*
  2. *objects:[numCoords][numObjs]
  3. *deviceClusters:[numCoords][numClusters]
  4. *membership:[numObjs]
  5. */
  6. __global__staticvoidfind_nearest_cluster(intnumCoords,intnumObjs,intnumClusters,float*objects,float*deviceClusters,int*membership,int*intermediates)
  7. {
  8. extern__shared__charsharedMemory[];
  9. unsignedchar*membershipChanged=(unsignedchar*)sharedMemory;
  10. float*clusters=deviceClusters;
  11. membershipChanged[threadIdx.x]=0;
  12. intobjectId=blockDim.x*blockIdx.x+threadIdx.x;
  13. if(objectId<numObjs)
  14. {
  15. intindex;
  16. floatdist,min_dist;
  17. /*findtheclusteridthathasmindistancetoobject*/
  18. index=0;
  19. min_dist=euclid_dist_2(numCoords,numObjs,numClusters,objects,clusters,objectId,0);
  20. for(inti=0;i<numClusters;i++)
  21. {
  22. dist=euclid_dist_2(numCoords,numObjs,numClusters,objects,clusters,objectId,i);
  23. /*noneedsquareroot*/
  24. if(dist<min_dist)
  25. {
  26. min_dist=dist;
  27. index=i;
  28. }
  29. }
  30. if(membership[objectId]!=index)
  31. {
  32. membershipChanged[threadIdx.x]=1;
  33. }
  34. //assignthemembershiptoobjectobjectId
  35. membership[objectId]=index;
  36. __syncthreads();//formembershipChanged[]
  37. #if1
  38. //blockDim.x*must*beapoweroftwo!
  39. for(unsignedints=blockDim.x/2;s>0;s>>=1)
  40. {
  41. if(threadIdx.x<s)
  42. {
  43. membershipChanged[threadIdx.x]+=membershipChanged[threadIdx.x+s];//calculateallchangedvaluesandsaveresulttomembershipChanged[0]
  44. }
  45. __syncthreads();
  46. }
  47. if(threadIdx.x==0)
  48. {
  49. intermediates[blockIdx.x]=membershipChanged[0];
  50. }
  51. #endif
  52. }
  53. }//find_nearest_cluster
这个函数计算的就是第objectId个数据点到numClusters个中心点的距离,然后根据情况比较更新membership。


这三个函数将所有能够并行的地方都进行了并行,实现了整体算法的并行化~


在此呈上全部代码:

kmeans.h:

  1. #ifndef_H_KMEANS
  2. #define_H_KMEANS
  3. #include<assert.h>
  4. #definemalloc2D(name,xDim,yDim,type)do{
  5. name=(type**)malloc(xDim*sizeof(type*));
  6. assert(name!=NULL);
  7. name[0]=(type*)malloc(xDim*yDim*sizeof(type));
  8. assert(name[0]!=NULL);
  9. for(size_ti=1;i<xDim;i++)
  10. name[i]=name[i-1]+yDim;
  11. }while(0)
  12. doublewtime(void);
  13. #endif

wtime.cu:

  1. #include<sys/time.h>
  2. #include<stdio.h>
  3. #include<stdlib.h>
  4. doublewtime(void)
  5. {
  6. doublenow_time;
  7. structtimevaletstart;
  8. structtimezonetzp;
  9. if(gettimeofday(&etstart,&tzp)==-1)
  10. perror("Error:callinggettimeofday()notsuccessful. ");
  11. now_time=((double)etstart.tv_sec)+/*inseconds*/
  12. ((double)etstart.tv_usec)/1000000.0;/*inmicroseconds*/
  13. returnnow_time;
  14. }


cuda_kmeans.cu:

  1. #include<stdio.h>
  2. #include<stdlib.h>
  3. #include<string.h>
  4. #include<sys/types.h>
  5. #include<sys/stat.h>
  6. #include<unistd.h>
  7. #include<iostream>
  8. #include<cassert>
  9. #include"kmeans.h"
  10. usingnamespacestd;
  11. constintMAX_CHAR_PER_LINE=1024;
  12. classKMEANS
  13. {
  14. private:
  15. intnumClusters;
  16. intnumCoords;
  17. intnumObjs;
  18. int*membership;//[numObjs]
  19. char*filename;
  20. float**objects;//[numObjs][numCoords]dataobjects
  21. float**clusters;//[numClusters][unmCoords]clustercenter
  22. floatthreshold;
  23. intloop_iterations;
  24. public:
  25. KMEANS(intk);
  26. voidfile_read(char*fn);
  27. voidfile_write();
  28. voidcuda_kmeans();
  29. inlineintnextPowerOfTwo(intn);
  30. voidfree_memory();
  31. virtual~KMEANS();
  32. };
  33. KMEANS::~KMEANS()
  34. {
  35. free(membership);
  36. free(clusters[0]);
  37. free(clusters);
  38. free(objects[0]);
  39. free(objects);
  40. }
  41. KMEANS::KMEANS(intk)
  42. {
  43. threshold=0.001;
  44. numObjs=0;
  45. numCoords=0;
  46. numClusters=k;
  47. filename=NULL;
  48. loop_iterations=0;
  49. }
  50. voidKMEANS::file_write()
  51. {
  52. FILE*fptr;
  53. charoutFileName[1024];
  54. //output:thecoordinatesoftheclustercentres
  55. sprintf(outFileName,"%s.cluster_centres",filename);
  56. printf("WritingcoordinatesofK=%dclustercenterstofile"%s" ",numClusters,outFileName);
  57. fptr=fopen(outFileName,"w");
  58. for(inti=0;i<numClusters;i++)
  59. {
  60. fprintf(fptr,"%d",i);
  61. for(intj=0;j<numCoords;j++)
  62. fprintf(fptr,"%f",clusters[i][j]);
  63. fprintf(fptr," ");
  64. }
  65. fclose(fptr);
  66. //output:theclosestclustercentretoeachofthedatapoints
  67. sprintf(outFileName,"%s.membership",filename);
  68. printf("writingmembershipofN=%ddataobjectstofile"%s" ",numObjs,outFileName);
  69. fptr=fopen(outFileName,"w");
  70. for(inti=0;i<numObjs;i++)
  71. {
  72. fprintf(fptr,"%d%d ",i,membership[i]);
  73. }
  74. fclose(fptr);
  75. }
  76. inlineintKMEANS::nextPowerOfTwo(intn)
  77. {
  78. n--;
  79. n=n>>1|n;
  80. n=n>>2|n;
  81. n=n>>4|n;
  82. n=n>>8|n;
  83. n=n>>16|n;
  84. //n=n>>32|n;//for64-bitints
  85. return++n;
  86. }
  87. __host____device__inlinestatic
  88. floateuclid_dist_2(intnumCoords,intnumObjs,intnumClusters,float*objects,float*clusters,intobjectId,intclusterId)
  89. {
  90. inti;
  91. floatans=0;
  92. for(i=0;i<numCoords;i++)
  93. {
  94. ans+=(objects[numObjs*i+objectId]-clusters[numClusters*i+clusterId])*
  95. (objects[numObjs*i+objectId]-clusters[numClusters*i+clusterId]);
  96. }
  97. returnans;
  98. }
  99. /*
  100. *numIntermediates:Theactualnumberofintermediates
  101. *numIntermediates2:Thenextpoweroftwo
  102. */
  103. __global__staticvoidcompute_delta(int*deviceIntermediates,intnumIntermediates,intnumIntermediates2)
  104. {
  105. extern__shared__unsignedintintermediates[];
  106. intermediates[threadIdx.x]=(threadIdx.x<numIntermediates)?deviceIntermediates[threadIdx.x]:0;
  107. __syncthreads();
  108. //numIntermediates2*must*beapoweroftwo!
  109. for(unsignedints=numIntermediates2/2;s>0;s>>=1)
  110. {
  111. if(threadIdx.x<s)
  112. {
  113. intermediates[threadIdx.x]+=intermediates[threadIdx.x+s];
  114. }
  115. __syncthreads();
  116. }
  117. if(threadIdx.x==0)
  118. {
  119. deviceIntermediates[0]=intermediates[0];
  120. }
  121. }
  122. /*
  123. *objects:[numCoords][numObjs]
  124. *deviceClusters:[numCoords][numClusters]
  125. *membership:[numObjs]
  126. */
  127. __global__staticvoidfind_nearest_cluster(intnumCoords,intnumObjs,intnumClusters,float*objects,float*deviceClusters,int*membership,int*intermediates)
  128. {
  129. extern__shared__charsharedMemory[];
  130. unsignedchar*membershipChanged=(unsignedchar*)sharedMemory;
  131. float*clusters=deviceClusters;
  132. membershipChanged[threadIdx.x]=0;
  133. intobjectId=blockDim.x*blockIdx.x+threadIdx.x;
  134. if(objectId<numObjs)
  135. {
  136. intindex;
  137. floatdist,min_dist;
  138. /*findtheclusteridthathasmindistancetoobject*/
  139. index=0;
  140. min_dist=euclid_dist_2(numCoords,numObjs,numClusters,objects,clusters,objectId,0);
  141. for(inti=0;i<numClusters;i++)
  142. {
  143. dist=euclid_dist_2(numCoords,numObjs,numClusters,objects,clusters,objectId,i);
  144. /*noneedsquareroot*/
  145. if(dist<min_dist)
  146. {
  147. min_dist=dist;
  148. index=i;
  149. }
  150. }
  151. if(membership[objectId]!=index)
  152. {
  153. membershipChanged[threadIdx.x]=1;
  154. }
  155. //assignthemembershiptoobjectobjectId
  156. membership[objectId]=index;
  157. __syncthreads();//formembershipChanged[]
  158. #if1
  159. //blockDim.x*must*beapoweroftwo!
  160. for(unsignedints=blockDim.x/2;s>0;s>>=1)
  161. {
  162. if(threadIdx.x<s)
  163. {
  164. membershipChanged[threadIdx.x]+=membershipChanged[threadIdx.x+s];//calculateallchangedvaluesandsaveresulttomembershipChanged[0]
  165. }
  166. __syncthreads();
  167. }
  168. if(threadIdx.x==0)
  169. {
  170. intermediates[blockIdx.x]=membershipChanged[0];
  171. }
  172. #endif
  173. }
  174. }//find_nearest_cluster
  175. voidKMEANS::cuda_kmeans()
  176. {
  177. intindex,loop=0;
  178. int*newClusterSize;//[numClusters]:no.objectsassignedineachnewcluster
  179. floatdelta;//%ofobjectschangestheirclusters
  180. float**dimObjects;//[numCoords][numObjs]
  181. float**dimClusters;
  182. float**newClusters;//[numCoords][numClusters]
  183. float*deviceObjects;//[numCoords][numObjs]
  184. float*deviceClusters;//[numCoords][numclusters]
  185. int*deviceMembership;
  186. int*deviceIntermediates;
  187. //Copyobjectsgivenin[numObjs][numCoords]layouttonew[numCoords][numObjs]layout
  188. malloc2D(dimObjects,numCoords,numObjs,float);
  189. for(inti=0;i<numCoords;i++)
  190. {
  191. for(intj=0;j<numObjs;j++)
  192. {
  193. dimObjects[i][j]=objects[j][i];
  194. }
  195. }
  196. //pickfirstnumClusterselementsofobjects[]asinitialclustercenters
  197. malloc2D(dimClusters,numCoords,numClusters,float);
  198. for(inti=0;i<numCoords;i++)
  199. {
  200. for(intj=0;j<numClusters;j++)
  201. {
  202. dimClusters[i][j]=dimObjects[i][j];
  203. }
  204. }
  205. newClusterSize=newint[numClusters];
  206. assert(newClusterSize!=NULL);
  207. malloc2D(newClusters,numCoords,numClusters,float);
  208. memset(newClusters[0],0,numCoords*numClusters*sizeof(float));
  209. //Tosupportreduction,numThreadsPerClusterBlock*must*beapoweroftwo,andit*must*benolargerthanthenumberofbitsthatwillfitintoanunsignedchar,thetypeusedtokeeptrackofmembershipchangesinthekernel.
  210. constunsignedintnumThreadsPerClusterBlock=32;
  211. constunsignedintnumClusterBlocks=(numObjs+numThreadsPerClusterBlock-1)/numThreadsPerClusterBlock;
  212. constunsignedintnumReductionThreads=nextPowerOfTwo(numClusterBlocks);
  213. constunsignedintclusterBlockSharedDataSize=numThreadsPerClusterBlock*sizeof(unsignedchar);
  214. constunsignedintreductionBlockSharedDataSize=numReductionThreads*sizeof(unsignedint);
  215. cudaMalloc(&deviceObjects,numObjs*numCoords*sizeof(float));
  216. cudaMalloc(&deviceClusters,numClusters*numCoords*sizeof(float));
  217. cudaMalloc(&deviceMembership,numObjs*sizeof(int));
  218. cudaMalloc(&deviceIntermediates,numReductionThreads*sizeof(unsignedint));
  219. cudaMemcpy(deviceObjects,dimObjects[0],numObjs*numCoords*sizeof(float),cudaMemcpyHostToDevice);
  220. cudaMemcpy(deviceMembership,membership,numObjs*sizeof(int),cudaMemcpyHostToDevice);
  221. do
  222. {
  223. cudaMemcpy(deviceClusters,dimClusters[0],numClusters*numCoords*sizeof(float),cudaMemcpyHostToDevice);
  224. find_nearest_cluster<<<numClusterBlocks,numThreadsPerClusterBlock,clusterBlockSharedDataSize>>>(numCoords,numObjs,numClusters,deviceObjects,deviceClusters,deviceMembership,deviceIntermediates);
  225. cudaDeviceSynchronize();
  226. compute_delta<<<1,numReductionThreads,reductionBlockSharedDataSize>>>(deviceIntermediates,numClusterBlocks,numReductionThreads);
  227. cudaDeviceSynchronize();
  228. intd;
  229. cudaMemcpy(&d,deviceIntermediates,sizeof(int),cudaMemcpyDeviceToHost);
  230. delta=(float)d;
  231. cudaMemcpy(membership,deviceMembership,numObjs*sizeof(int),cudaMemcpyDeviceToHost);
  232. for(inti=0;i<numObjs;i++)
  233. {
  234. //findthearrayindexofnestest
  235. index=membership[i];
  236. //updatenewclustercenters:sumofobjectslocatedwithin
  237. newClusterSize[index]++;
  238. for(intj=0;j<numCoords;j++)
  239. {
  240. newClusters[j][index]+=objects[i][j];
  241. }
  242. }
  243. //averagethesumandreplaceoldclustercenterswithnewClusters
  244. for(inti=0;i<numClusters;i++)
  245. {
  246. for(intj=0;j<numCoords;j++)
  247. {
  248. if(newClusterSize[i]>0)
  249. dimClusters[j][i]=newClusters[j][i]/newClusterSize[i];
  250. newClusters[j][i]=0.0;//setbackto0
  251. }
  252. newClusterSize[i]=0;//setbackto0
  253. }
  254. delta/=numObjs;
  255. }while(delta>threshold&&loop++<500);
  256. loop_iterations=loop+1;
  257. malloc2D(clusters,numClusters,numCoords,float);
  258. for(inti=0;i<numClusters;i++)
  259. {
  260. for(intj=0;j<numCoords;j++)
  261. {
  262. clusters[i][j]=dimClusters[j][i];
  263. }
  264. }
  265. cudaFree(deviceObjects);
  266. cudaFree(deviceClusters);
  267. cudaFree(deviceMembership);
  268. cudaFree(deviceMembership);
  269. free(dimObjects[0]);
  270. free(dimObjects);
  271. free(dimClusters[0]);
  272. free(dimClusters);
  273. free(newClusters[0]);
  274. free(newClusters);
  275. free(newClusterSize);
  276. }
  277. voidKMEANS::file_read(char*fn)
  278. {
  279. FILE*infile;
  280. char*line=newchar[MAX_CHAR_PER_LINE];
  281. intlineLen=MAX_CHAR_PER_LINE;
  282. filename=fn;
  283. infile=fopen(filename,"r");
  284. assert(infile!=NULL);
  285. /*findthenumberofobjects*/
  286. while(fgets(line,lineLen,infile))
  287. {
  288. numObjs++;
  289. }
  290. /*findthedimensionofeachobject*/
  291. rewind(infile);
  292. while(fgets(line,lineLen,infile)!=NULL)
  293. {
  294. if(strtok(line," ")!=0)
  295. {
  296. while(strtok(NULL," "))
  297. numCoords++;
  298. break;
  299. }
  300. }
  301. /*allocatespaceforobject[][]andreadallobjcet*/
  302. rewind(infile);
  303. objects=newfloat*[numObjs];
  304. for(inti=0;i<numObjs;i++)
  305. {
  306. objects[i]=newfloat[numCoords];
  307. }
  308. inti=0;
  309. /*readallobject*/
  310. while(fgets(line,lineLen,infile)!=NULL)
  311. {
  312. if(strtok(line," ")==NULL)continue;
  313. for(intj=0;j<numCoords;j++)
  314. {
  315. objects[i][j]=atof(strtok(NULL,", "));
  316. }
  317. i++;
  318. }
  319. /*membership:theclusteridforeachdataobject*/
  320. membership=newint[numObjs];
  321. assert(membership!=NULL);
  322. for(inti=0;i<numObjs;i++)
  323. membership[i]=-1;
  324. }
  325. intmain(intargc,char*argv[])
  326. {
  327. KMEANSkmeans(atoi(argv[1]));
  328. kmeans.file_read(argv[2]);
  329. kmeans.cuda_kmeans();
  330. kmeans.file_write();
  331. return0;
  332. }


makefile:

  1. target:
  2. nvcccuda_kmeans.cu
  3. ./a.out4./Image_data/color100.txt

所有代码和文件数据在这里:http://yunpan.cn/cKBZMPAJ8tcAs(提取码:9476)


运行代码:

【CUDA并行编程之八】Cuda实现Kmeans算法第19张


kmeans的cuda实现代码相对复杂,在阅读的过程中可能会有困难,有问题请留言~


Author:忆之独秀

Email:leaguenew@qq.com

注明出处:http://blog.csdn.net/lavorange/article/details/41942323


免责声明:文章转载自《【CUDA并行编程之八】Cuda实现Kmeans算法》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇[UWP小白日记2]SQLite数据库DOMEBootstrap实现弹出框和提示框效果代码下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

ws2s函数

std::string ws2s(const std::wstring&str) { char*pElementText; intiTextLen; //wide char to multi char iTextLen = WideCharToMultiByte(CP_ACP, 0, str.c_str(), -1,...

C#中char[]与string之间的转换;byte[]与string之间的转化

(1)C#中char[]与string互相转换的写法:string 转换成 Char[]string ss="abcdefg";char[] cc=ss.ToCharArray();Char[] 转换成stringstring s=new string(cc);------------------------------------------------...

C++获取MAC与IP

#include <Nb30.h> #pragma comment(lib,"ws2_32.lib") #pragma comment(lib,"netapi32.lib") std::stringGetMac() { NCB ncb; typedef struct_ASTAT_ { ADA...

【转载】C/C++中的char,wchar,TCHAR

点击这里查看原文章 总体简介:由于字符编码的不同,在C++中有三种对于字符类型:char, wchar_t , TCHAR。其实TCHAR不能算作一种类型,他紧紧是一个宏。我们都知道,宏在预编译的时候会被替换成相应的内容。TCHAR 在使用多字节编码时被定义成char,在Unicode编码时定义成wchar_t。 1.VC++中的char,wchar_t...

DAG优化

DAG优化 Time Limit: 1000 ms Memory Limit: 65536 KiB Problem Description 大家都学过了代码优化,其中有一个DAG优化,这次我们就练习这个操作。 Input 输入第一行为一个整数n(n < 100),表示该组输入的表达式的个数...

CAS原子操作实现无锁及性能分析

  CAS原子操作实现无锁及性能分析 Author:Echo Chen(陈斌) Email:chenb19870707@gmail.com Blog:Blog.csdn.net/chen19870707 Date:Nov 13th, 2014 近期在研究nginx的自旋锁的时候,又见到了GCC CAS原子操作,于是决定动手分析下CAS实现的无锁究竟性能...